Architecture and Data Blog

Thoughts about intersection of data, devops, design and software architecture

Back to blogging

There has been a long pause in my blogging activity. I was trying to finish of my latest writing engagement in regards to NoSQL. Working with Martin Fowler on NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence was really fun. This book will provide a concise text and easy way to understand for everyone the rise of the NoSQL movement and help with what kinds of trade-offs need to be made while working with NoSQL.

MSSQL JDBC Driver behavior

My latest project involves talking to MS-SQL Server using the JDBC driver and Java. While doing this we setup the database connection and had a simple SQL to get the first_name and last_name for a unique user_id from the application_user table in the database. SELECT first_name,last_name FROM application_user WHERE user_id = ? Given the above SQL, we did not think too much about performance as the user_id was indexed. The java code as below was used to run the SQL.

With so much pain, why are stored procedures used so much

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are. Writing stored procedure code is fraught with danger as there are no modern IDE’s that support refactoring, provide code smells like “variable not used”, “variable out of scope”.

Replica sets in MongoDB

Replica sets is a feature of MongoDB for Automatic Failover, in this setup there is a primary server and the rest are secondary servers. If the primary server goes down, the rest of the secondary servers choose a new primary via an election process, each server can also be assigned number of votes, so that you can decide the next primary based on data-center location, machine properties etc, you can also start mongo database processes that act only as election tie-breakers these are known as arbiters, these arbiters will never have data, but just act as agents that break the tie.

Schema less databases and its ramifications.

In the No-SQL land schema-less is a power full feature that is advertised a lot, schema-less basically means you don’t have to worry about column names and table names in a traditional sense, if you want to change the column name you just start saving the data using the new column name Lets say you have a document database like mongoDB and you have JSON document as shown below. { "_id":"4bc9157e201f254d204226bf", "FIRST_NAME":"JOHN", "MIDDLE_NAME":"D", "LAST_NAME":"DOE", "CREATED":"2010-10-12" } You have some corresponding code to read the documents from the database and lets say you lots of data in the database in the order of millions of documents.

Effective use of data for better customer experience.

For more than seven years I have been getting offers for credit cards from Airlines and Banks. One particular bank has been sending me these solicitations for more than seven years. That is 12 mailings per year, more than 72 mailings so far, remember these are physical paper mailings not the electronic kind. I don’t like the junk, it hurts the environment and worst of all I think its not good use of the data they have.

Schema design in a document database

We are using MongoDB on our project, since mongo is document store, schema design is somewhat different, when you are using traditional RDBMS data stores, one thinks about tables and rows, while using a document database you have to think about the schema in a some what different way. Lets say, we want to save a customer object, when using a RDBMS we would come up with Customer, Address, Phone, Email.

My experience with MongoDB

The current project I’m on is using MongoDB. MongoDB is a document based database, it stores JSON objects as BSON (Binary JSON objects). MongoDB provides a middle ground between the traditional RDBMS and the NOSql databases out there, it provides for indexes, dynamic queries, replication, map reduce and auto sharding, its open source and can be downloaded here, starting up mongodb is pretty easy. ./mongod --dbpath=/user/data/db is all you need, where /user/data/db is the path where you want mongo to create its data files.

Workshop at Enterprise Data World 2010

Doing a workshop on Agile Database Development at Enterprise Data World 2010 at SF. See you there.

Testing in data conversion projects

When working on projects involving Conversion of data or Migration/Moving of data from a legacy database. The testing effort is enormous and testing takes a lot of time, some test automation can help this effort. Since data is moved/changed from a source database to destination database, we can write sql which should provide results for the types of tests you want to perform, for example: write a sql to give us number of customers, write a sql to give us account balance for a specific account.