Architecture and Data Blog

Thoughts about intersection of data, devops, design and software architecture

10 node Riak cluster on a single machine

When trying to evaluate NoSQL databases, its usually better to try them out. While trying them out, its better to use them with multiple node configurations instead of running single node. Such as clusters in Riak or Replica-set in mongodb maybe even a sharded setup. On our project we evaluated a 10 node Riak cluster so that we could experiment with N, R and W values and decide which values where optimal for us. In Riak here is what N, R and W mean.


Transactions using Groovy.SQL with Spring annotations and connection pools

When using Groovy with Spring framework,  interacting with the database can be done using the Groovy.SQL class which provides a easy to use interface. When using Groovy.SQL, if we have a need to do transactions, we have the .withTransaction method that accepts a closure, to which we can pass in code to execute within the transaction. In our project since we were using spring already, using annotations to define transactions would be a great. Standard @Transactional annotations with Groovy.SQL will not work, since every place where the Groovy.SQL is used a new connection is acquired from the connection pool causing the database work to span multiple connections, which can result in dead-locks on the database. What we really want is that the database connection be the same across all invocations of Groovy.SQL with-in the same transaction started by the annotated method.


Backup in mongodb replica-set configurations

How to ensure you have a backup in mongodb with regards to replica-sets

There are multiple ways to take backups of mongodb is different configuraitions, one of the configuration that I have been involved recently is replica-sets. When mongodb is running in replica-set configuration, there is a single primary node and multiple secondary nodes. To take backup of the replica-set we can either do a mongodump of one of the nodes or shutdown one of the secondary nodes and take file copies, since in a replica-set all nodes have the same data (except arbiter). Lets see how we could deal with mongodump method of taking backup.


Back to blogging

Writing again

NoSql Distilled There has been a long pause in my blogging activity. I was trying to finish of my latest writing engagement in regards to NoSQL. Working with Martin Fowler on NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence was really fun. This book will provide a concise text and easy way to understand for everyone the rise of the NoSQL movement and help with what kinds of trade-offs need to be made while working with NoSQL.


MSSQL JDBC Driver behavior"

My latest project involves talking to MS-SQL Server using the JDBC driver and Java. While doing this we setup the database connection and had a simple SQL to get the first_name and last_name for a unique user_id from the application_user table in the database.

SELECT first_name,last_name
FROM application_user
WHERE user_id = ?

Given the above SQL, we did not think too much about performance as the user_id was indexed. The java code as below was used to run the SQL.


With so much pain, why are stored procedures used so much

Deciding when to used stored procedures is an important design decision that needs to be made with lots of thought

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are.

  • Writing stored procedure code is fraught with danger as there are no modern IDE’s that support refactoring, provide code smells like “variable not used”, “variable out of scope”.


Replica sets in MongoDB

How to setup high availability replicaset in mongodb

Replica sets is a feature of MongoDB for Automatic Failover, in this setup there is a primary server and the rest are secondary servers. If the primary server goes down, the rest of the secondary servers choose a new primary via an election process, each server can also be assigned number of votes, so that you can decide the next primary based on data-center location, machine properties etc, you can also start mongo database processes that act only as election tie-breakers these are known as arbiters, these arbiters will never have data, but just act as agents that break the tie.


Schema less databases and its ramifications.

In the No-SQL land schema-less is a power full feature that is advertised a lot, schema-less basically means you don’t have to worry about column names and table names in a traditional sense, if you want to change the column name you just start saving the data using the new column name Lets say you have a document database like mongoDB and you have JSON document as shown below.

{
"_id":"4bc9157e201f254d204226bf",
"FIRST_NAME":"JOHN",
"MIDDLE_NAME":"D",
"LAST_NAME":"DOE",
"CREATED":"2010-10-12"
}

You have some corresponding code to read the documents from the database and lets say you lots of data in the database in the order of millions of documents. If you want to change the name of some attributes or columns at this point and the new JSON would look like


Effective use of data for better customer experience.

Using data for effective operations

For more than seven years I have been getting offers for credit cards from Airlines and Banks. One particular bank has been sending me these solicitations for more than seven years. That is 12 mailings per year, more than 72 mailings so far, remember these are physical paper mailings not the electronic kind. I don’t like the junk, it hurts the environment and worst of all I think its not good use of the data they have. How hard is it to design a system around the data they have.


Schema design in a document database

We are using MongoDB on our project, since mongo is document store, schema design is somewhat different, when you are using traditional RDBMS data stores, one thinks about tables and rows, while using a document database you have to think about the schema in a some what different way. Lets say, we want to save a customer object, when using a RDBMS we would come up with Customer, Address, Phone, Email. They are related to each other as shown below. Relational Design