Passionate about data

Data and its implications on software design and development.

Schema Less Databases and Its Ramifications.

In the No-SQL land schema-less is a power full feature that is advertised a lot, schema-less basically means you don’t have to worry about column names and table names in a traditional sense, if you want to change the column name you just start saving the data using the new column name Lets say you have a document database like mongoDB and you have JSON document as shown below.

1
2
3
4
5
6
7
{
"_id":"4bc9157e201f254d204226bf",
"FIRST_NAME":"JOHN",
"MIDDLE_NAME":"D",
"LAST_NAME":"DOE",
"CREATED":"2010-10-12"
}

You have some corresponding code to read the documents from the database and lets say you lots of data in the database in the order of millions of documents. If you want to change the name of some attributes or columns at this point and the new JSON would look like

1
2
3
4
5
6
7
{
"_id":"4bc9157e201f254d204226bf",
"first_name":"JOHN",
"middle_name":"D",
"last_name":"DOE",
"created":"2010-10-12"
}

You will have to either change every document in the database to match the new attribute names or you have to make sure you code can handle both types of attribute names like

1
2
3
4
5
6
first_name = doc["first_name"] 
first_name = doc["FIRST_NAME"] unless !first_name.nil? 
middle_name = doc["middle_name"]  
middle_name = doc["MIDDLE_NAME"] unless !middle_name.nil?
last_name = doc["last_name"]
last_name = doc["LAST_NAME"] unless !last_name.nil?

This attribute name change also affects the indexes created on mongoDB, since the attribute name change is not across all the documents, an Index created on

1
db.people.ensureIndex({first_name:1})

will not index documents where the attribute name is FIRST_NAME, so you have to create another index for this new attribute name

1
db.people.ensureIndex({FIRST_NAME:1})

As you can see this gets really complicated if you do multiple refactorings, over a period of time. So when you hear schema less make sure you understand the ramifications of refactoring the attribute names at will and its effect on the code base and the database.