Architecture and Data Blog

Thoughts about intersection of data, devops, design and software architecture

10 node Riak cluster on a single machine

When trying to evaluate NoSQL databases, its usually better to try them out. While trying them out, its better to use them with multiple node configurations instead of running single node. Such as clusters in Riak or Replica-set in mongodb maybe even a sharded setup. On our project we evaluated a 10 node Riak cluster so that we could experiment with N, R and W values and decide which values where optimal for us.

Backup in mongodb replica-set configurations

There are multiple ways to take backups of mongodb is different configuraitions, one of the configuration that I have been involved recently is replica-sets. When mongodb is running in replica-set configuration, there is a single primary node and multiple secondary nodes. To take backup of the replica-set we can either do a mongodump of one of the nodes or shutdown one of the secondary nodes and take file copies, since in a replica-set all nodes have the same data (except arbiter).

Materialized views and database links in oracle.

Recently one of my colleague Jeff Norris had a weird error. He was trying to build a materialized view over some tables in his local database and some tables in his remote database using database links the sql to create the view ran fine and provided the results as expected, but when put inside a materialized view statement complained with ORA-00942 errors. Lets say the two databases in question are local and remote, so the sql to create the materialized view to load immediately and refresh everyday is

Version Control your work..

So we version control/source control everything on our project.. code/data/artifacts/diagrams etc. yesterday I said why not extend it to my writings to everything I have. So I started this long journey of refactoring my folder layout and making a nice folder structure to hold all the things I have written about have other artifacts in the process of writing and moved them all to subversion, now all my example code and writings are all under version control that gets backed up everyday….

Writing a SQL to generate a SQL

We had a weird requirement on our project recently.. Best way to do this we thought was to write a SQL statement against the table for each column that was going to have a Foreign Key constrained column and find out what data was not right or did not match the constraint. For example: If we have a INVOICE table that has a ITEMID on it. I want to find all the rows in the INVOICE table that have a ITEMID that does not exist in the ITEM table.

Setup and Teardown of database during testing

When doing Performance Testing or running Unit/Functional tests on a database, there is a need to periodically get the database to a known state, so that the tests behave in a predictable way and to get rid of all the data created by the tests. Some of the ways to get a clean database are. Using Scripts: Recreate the database using scripts, the same scripts that are used in development environment.

Experience using DBDeploy on my project

We have been using DBDeploy on my project for more than 6 months now and wanted to show how things are going. First lets talk about set up, we are using dbdeploy in our Java development environment with ANT as our build scripting tool, against a Oracle 10g database. Define the ANT task first <taskdef name=“dbdeploy” classname=“net.sf.dbdeploy.AntTarget” classpath=“lib/dbdeploy.jar”/>; Now we create the main dbinitialize task a ANT task to create you database schema, using the upgrade generated by the dbdeploy file shown below.

Database Migration Utility

I have taken up the hobby of searching the opensource landscape for tools that help me do Agile database development. I’m going to write about all the Tools that I come across that help me, my preference is opensource software but not limited to it. I will try to provide some sound examples and share my experiences with all that tools that I come across and share the example code I used.

Automated Tablespace deployment

In development mode you don’t want to worry about which table goes into what Tablespace in production as it complicates development environments. The production DBA’s want to have their input and control over deciding what table goes into what Tablespace. To allow for this I used a mapping scheme as shown below. Lets assume we have 3 tables in our system Customer, CustomerOrder, OrderStatus. Where we are expecting Customer table to have large numbers of rows and CustomerOrder to have significanly large number of rows while OrderStatus would have few rows and not change as much.