Passionate about data

Data and its implications on software design and development.

Create a Index for Every Foreign Key Constraint Created

When creating a Foreign Key constraint on the database as shown below


In the above example we are telling the database to check if the BOOK.AUTHORID is a valid value in the Author.AuthorID. When the Author table is being changed, the database does data verification on the BOOK table using SELECT against the BOOK table for the AUTHORID some thing like this


Basically the database server is trying to check if it has children rows for the row that just changed (inserted or deleted). While doing this if there is not index on BOOK.AUTHORID, the database will have to scan the whole table which is slow. Hence when creating a Foreign Key constraint, remember to create a corresponding INDEX on the table, so that the performance does not degrade, or when observing slow performance on a database after you put in Foreign Key constraints. Make sure to look for Indexes on the columns that are constrained.

Version Control Your Work..

So we version control/source control everything on our project.. code/data/artifacts/diagrams etc. yesterday I said why not extend it to my writings to everything I have. So I started this long journey of refactoring my folder layout and making a nice folder structure to hold all the things I have written about have other artifacts in the process of writing and moved them all to subversion, now all my example code and writings are all under version control that gets backed up everyday…. feels liberating

Tool Support for Database Refactoring

I’m always on the lookout for better tool support to do database refactoring. Just noticed that LiquiBase has come out with a IntelliJ plugin to support database refactoring.

This is really cool and hopefully one of long list of tools that will support database refactoring in the future. so enjoy

Writing a SQL to Generate a SQL

We had a weird requirement on our project recently..

Find all the Rows in All the tables that do not comply with the Constraints that we have in development but not in QA environments

Best way to do this we thought was to write a SQL statement against the table for each column that was going to have a Foreign Key constrained column and find out what data was not right or did not match the constraint. For example: If we have a INVOICE table that has a ITEMID on it. I want to find all the rows in the INVOICE table that have a ITEMID that does not exist in the ITEM table. Writing this SQL for our 400+ tables database was going to be huge task.

Oracles (or for that matter any databases metadata) metadata to the rescue and we ended up writing a SQL that would generate our above SQL.

here is the SQL that generated the above SQL

SELECT 'SELECT '''||table_name||'-'||column_name||''', count(*) FROM '|| 
table_name|| ' WHERE not exists (select 1 from '|| remote_table ||' where '||
remote_table||'.'||remote_column||' = '||table_name||'.'||column_name||') AND '
||table_name||'.'||column_name||' IS NOT NULL
   SELECT a.table_name,
      ( SELECT table_name FROM user_constraints 
      		WHERE constraint_name = a.R_CONSTRAINT_NAME) remote_table, 
      ( SELECT column_name FROM user_cons_columns 
      		WHERE constraint_name = a.R_CONSTRAINT_NAME) remote_column 
   FROM user_constraints a, user_cons_columns b 
   WHERE a.constraint_name = b.constraint_name 
   AND a.constraint_type = 'R' )

This SQL generates SQL that when run will give us data about tables that do not match our constraints requirements. If you have a CUSTOMER table which has CUSTOMERTYPEID and STATUSID on it, then the SQL generated would be.


Once the above SQL is run, the results will show us data that does not match the constraints we want to introduce into the QA environments.

Setup and Teardown of Database During Testing

When doing Performance Testing or running Unit/Functional tests on a database, there is a need to periodically get the database to a known state, so that the tests behave in a predictable way and to get rid of all the data created by the tests. Some of the ways to get a clean database are.

Using Scripts: Recreate the database using scripts, the same scripts that are used in development environment.

Using DB Backup: Especially when the database (and the data set) is large (using the scripts approach above will be slow) is to make a backup of the database in its pristine state and then run the tests, once the tests are done running, restore the database with the backup that was done before the tests corrupted the data.

Using Virtual Machine: The DB backup approach can be improved by using Virtual Machine (VM). Setup a VM and run the database server inside the VM, get the database and data so that the tests can run. Now make a image of the VM and run the tests, when the tests are done all that needs to be done is to restore the image of the VM.

Podcast on Keeping Gray Code Fit

Me, Andy, Jeff and Marjorie discuss how to keep a long running project fit in this Podcast, also on itunes. We discuss the management of technology, people, processes and tools on longer and more mature applications. Specific topics such as refactoring, knowledge management, innovation, staffing, production support and others are covered.

Exprerience Using DBDeploy on My Project

We have been using DBDeploy on my project for more than 6 months now and wanted to show how things are going. First lets talk about set up, we are using dbdeploy in our Java development environment with ANT as our build scripting tool, against a Oracle 10g database.

Define the ANT task first

<taskdef name=”dbdeploy” classname=”net.sf.dbdeploy.AntTarget” classpath=”lib/dbdeploy.jar”/>;

Now we create the main dbinitialize task a ANT task to create you database schema, using the upgrade generated by the dbdeploy file shown below. The thing to note is that dbdeploy generates the upgrade file but does not run it against your database, so we have to make sure we call the generated upgrade file via a sql ANT task.

<target name=”dbinit” depends=”init,dbclean”>     <echo message=”Working UserName: ${db.user}”/>     <mkdir dir=”${migrationfolder}”/>     <dbdeploy driver=”${}” url=”${db.url}”               userid=”${db.user}” password=”${db.password}” deltaset=”couger”               dir=”db/migration” outputfile=”${migrationfolder}/upgrade.sql”               dbms=”ora” undoOutputfile=”${migrationfolder}/undo.sql”/>     <!–Now run the generate upgrade file –>     <sql password=”${db.password}” userid=”${db.user}” url=”${db.url}” driver=”${}”          classpath=”${driver.classpath}” onerror=”abort”>         <fileset includes=”upgrade.sql” dir=”${migrationfolder}”/>     </sql> </target> So lets say we want to write the first migration (migration or delta is the same in this context) we will create a new file in the db/migration folder named as 001_CreateCustomerTable.sql the 001 is just a number to sequence the migrations, dbdeploy only cares that the numbers increment and CreateCustomerTable is used to give it a meaningful name, you can name the migration as 1.sql but thats not meaningful is it? neither does the name really say what its doing. When we are done writing the migration and confirm that it works locally, we check in the file (we used subversion). When <a href=””>CruiseControl</a> build was done we also published all the migrations from db/migration folder on CruiseControl artifacts page using the onsuccess event <publishers>     <onsuccess>             …                             <artifactspublisher dest=”artifacts/${}” dir=”projects/${}/db/migration”/>     </onsuccess> </publishers>

Publishing the migrations allowed us to know what all migrations are needed for this particular build. we have deployed to production twice already and have found this process to be smooth, doing migrations has allowed us to test our migrations hundreds of times and also test them against a copy of the production database before hand so that we can tune the migrations for performance if needed. Delivering the migrations to the client is also easy since its pure SQL that the client DBA’s can look at and be comfortable about the migration/upgrade of their database.

Lessons From Re-Learning

I have been working on a project that I had worked in 2005, trying to get a handle on what I had done about 3 years back. Exploring code and the database has been fun, also discovering the data layout and building new set of data for production has been extremely entertaining. What I learnt from this whole experience was this <blockquote>if your code(application or other wise) is not expecting data, this data should not be provided or even considered valid by the database, the database should be designed such that it does not even allow invalid combinations of the data.</blockquote>

Here is an example. We have a base table named as SETTINGS, that can be changed by the user by overriding the value in the extension table known as SETTINGSEXTENSION, table structure is show below

KEY (BusinessKey)
VALUE (Value for the Key)
OVERRIDDENVALUE (Value overriding the value in the SETTINGS table)
BYUSERID (Value over ridden for user)

in this scheme the same user can create extensions for the same base key causing the application to barf. If we introduce unique index on SETTINGSEXTENSION TABLE(SETTINGSID, BYUSERID) we ensure that the database does not allow this data and makes sure that the application will not barf.

There are many more other things I learnt.. those will follow..

Cannot Make It Out to XP Day London

After a lot of frustration about my schedule, I have had to come to this conclusion, that I cannot physically make it to London, XP day.

I’m going to miss it. Nick Ashley is going to take up my spot and I know he will do a great job.