Passionate about data

Data and its implications on software design and development.

Using Explicit Order by in Your SQL Statements

Recently when our test databases where upgraded new version of Oracle, we started noticing that the order in which some drop down lists were being displayed was not correct. It turns out that the SELECT statement we had, did not have a ORDER BY clause and the data was being returned in the ORDER of the creation of rows (in the order of ROWID) when the database was upgraded these ROWID’s got changed and hence the ORDER of the data being shown in the drop down lists.

Lesson learnt: have a EXPLICIT ORDER BY in every SQL that provides data to be shown on the screen. In other words do not rely on the order of the result set provided by the current version of the database server. if you prefer a certain order make that choice explicit in the SQL SELECT statement.

Moved to a Mac

Couple of weeks back I was given a choice to upgrade my work Laptop to a Mac Book Pro or a Windows Laptop. I choose Mac ( I know everyone is into macs nowadays). The transition was pretty good, with the exception of moving my oracle database from windows to mac, since there is no native installation of oracle on mac I had to use VMWare fusion to install oracle.

After having oracle run inside the VM, I started setting up the dev environment, I ran into issues with ANT version and JAVA version but they where resolved by pointing to the right location.

Overall this has been a positive experience so far.

Create a Index for Every Foreign Key Constraint Created

When creating a Foreign Key constraint on the database as shown below

1
2
3
4
ALTER TABLE BOOK ADD 
    (CONSTRAINT FK_BOOK_AUTHOR FOREIGN KEY (AUTHORID) 
     REFERENCES AUTHOR)
/

In the above example we are telling the database to check if the BOOK.AUTHORID is a valid value in the Author.AuthorID. When the Author table is being changed, the database does data verification on the BOOK table using SELECT against the BOOK table for the AUTHORID some thing like this

1
SELECT count(*) FROM BOOK WHERE AUTHORID = nnnn

Basically the database server is trying to check if it has children rows for the row that just changed (inserted or deleted). While doing this if there is not index on BOOK.AUTHORID, the database will have to scan the whole table which is slow. Hence when creating a Foreign Key constraint, remember to create a corresponding INDEX on the table, so that the performance does not degrade, or when observing slow performance on a database after you put in Foreign Key constraints. Make sure to look for Indexes on the columns that are constrained.

Version Control Your Work..

So we version control/source control everything on our project.. code/data/artifacts/diagrams etc. yesterday I said why not extend it to my writings to everything I have. So I started this long journey of refactoring my folder layout and making a nice folder structure to hold all the things I have written about have other artifacts in the process of writing and moved them all to subversion, now all my example code and writings are all under version control that gets backed up everyday…. feels liberating

Tool Support for Database Refactoring

I’m always on the lookout for better tool support to do database refactoring. Just noticed that LiquiBase has come out with a IntelliJ plugin to support database refactoring.

This is really cool and hopefully one of long list of tools that will support database refactoring in the future. so enjoy

Writing a SQL to Generate a SQL

We had a weird requirement on our project recently..

Find all the Rows in All the tables that do not comply with the Constraints that we have in development but not in QA environments

Best way to do this we thought was to write a SQL statement against the table for each column that was going to have a Foreign Key constrained column and find out what data was not right or did not match the constraint. For example: If we have a INVOICE table that has a ITEMID on it. I want to find all the rows in the INVOICE table that have a ITEMID that does not exist in the ITEM table. Writing this SQL for our 400+ tables database was going to be huge task.

Oracles (or for that matter any databases metadata) metadata to the rescue and we ended up writing a SQL that would generate our above SQL.

here is the SQL that generated the above SQL

SELECT 'SELECT '''||table_name||'-'||column_name||''', count(*) FROM '|| 
table_name|| ' WHERE not exists (select 1 from '|| remote_table ||' where '||
remote_table||'.'||remote_column||' = '||table_name||'.'||column_name||') AND '
||table_name||'.'||column_name||' IS NOT NULL
UNION ALL' 
FROM (
   SELECT a.table_name,
      column_name,
      ( SELECT table_name FROM user_constraints 
      		WHERE constraint_name = a.R_CONSTRAINT_NAME) remote_table, 
      ( SELECT column_name FROM user_cons_columns 
      		WHERE constraint_name = a.R_CONSTRAINT_NAME) remote_column 
   FROM user_constraints a, user_cons_columns b 
   WHERE a.constraint_name = b.constraint_name 
   AND a.constraint_type = 'R' )

This SQL generates SQL that when run will give us data about tables that do not match our constraints requirements. If you have a CUSTOMER table which has CUSTOMERTYPEID and STATUSID on it, then the SQL generated would be.

SELECT 'CUSTOMER-CUSTOMERTYPEID', COUNT(*) FROM CUSTOMER  WHERE NOT EXISTS 
      (SELECT 1 FROM CUSTOMERTYPE WHERE 
             CUSTOMERTYPE.CUSTOMERTYPEID = CUSTOMER.CUSTOMERTYPEID) 
  AND CUSTOMER.CUSTOMERTYPEID IS NOT NULL
UNION ALL
SELECT 'CUSTOMER-STATUSID', COUNT(*) FROM CUSTOMER WHERE NOT EXISTS 
       (SELECT 1 FROM STATUS WHERE 
             STATUS.STATUSID = CUSTOMER.STATUSID) 
  AND CUSTOMER.STATUSID IS NOT NULL

Once the above SQL is run, the results will show us data that does not match the constraints we want to introduce into the QA environments.

Setup and Teardown of Database During Testing

When doing Performance Testing or running Unit/Functional tests on a database, there is a need to periodically get the database to a known state, so that the tests behave in a predictable way and to get rid of all the data created by the tests. Some of the ways to get a clean database are.

Using Scripts: Recreate the database using scripts, the same scripts that are used in development environment.

Using DB Backup: Especially when the database (and the data set) is large (using the scripts approach above will be slow) is to make a backup of the database in its pristine state and then run the tests, once the tests are done running, restore the database with the backup that was done before the tests corrupted the data.

Using Virtual Machine: The DB backup approach can be improved by using Virtual Machine (VM). Setup a VM and run the database server inside the VM, get the database and data so that the tests can run. Now make a image of the VM and run the tests, when the tests are done all that needs to be done is to restore the image of the VM.

Podcast on Keeping Gray Code Fit

Me, Andy, Jeff and Marjorie discuss how to keep a long running project fit in this Podcast, also on itunes. We discuss the management of technology, people, processes and tools on longer and more mature applications. Specific topics such as refactoring, knowledge management, innovation, staffing, production support and others are covered.

Exprerience Using DBDeploy on My Project

We have been using DBDeploy on my project for more than 6 months now and wanted to show how things are going. First lets talk about set up, we are using dbdeploy in our Java development environment with ANT as our build scripting tool, against a Oracle 10g database.

Define the ANT task first

<taskdef name=”dbdeploy” classname=”net.sf.dbdeploy.AntTarget” classpath=”lib/dbdeploy.jar”/>;

Now we create the main dbinitialize task a ANT task to create you database schema, using the upgrade generated by the dbdeploy file shown below. The thing to note is that dbdeploy generates the upgrade file but does not run it against your database, so we have to make sure we call the generated upgrade file via a sql ANT task.

<target name=”dbinit” depends=”init,dbclean”>     <echo message=”Working UserName: ${db.user}”/>     <mkdir dir=”${migrationfolder}”/>     <dbdeploy driver=”${driver.name}” url=”${db.url}”               userid=”${db.user}” password=”${db.password}” deltaset=”couger”               dir=”db/migration” outputfile=”${migrationfolder}/upgrade.sql”               dbms=”ora” undoOutputfile=”${migrationfolder}/undo.sql”/>     <!–Now run the generate upgrade file –>     <sql password=”${db.password}” userid=”${db.user}” url=”${db.url}” driver=”${driver.name}”          classpath=”${driver.classpath}” onerror=”abort”>         <fileset includes=”upgrade.sql” dir=”${migrationfolder}”/>     </sql> </target> So lets say we want to write the first migration (migration or delta is the same in this context) we will create a new file in the db/migration folder named as 001_CreateCustomerTable.sql the 001 is just a number to sequence the migrations, dbdeploy only cares that the numbers increment and CreateCustomerTable is used to give it a meaningful name, you can name the migration as 1.sql but thats not meaningful is it? neither does the name really say what its doing. When we are done writing the migration and confirm that it works locally, we check in the file (we used subversion). When <a href=”http://cruisecontrol.sourceforge.net/”>CruiseControl</a> build was done we also published all the migrations from db/migration folder on CruiseControl artifacts page using the onsuccess event <publishers>     <onsuccess>             …                             <artifactspublisher dest=”artifacts/${project.name}” dir=”projects/${project.name}/db/migration”/>     </onsuccess> </publishers>

Publishing the migrations allowed us to know what all migrations are needed for this particular build. we have deployed to production twice already and have found this process to be smooth, doing migrations has allowed us to test our migrations hundreds of times and also test them against a copy of the production database before hand so that we can tune the migrations for performance if needed. Delivering the migrations to the client is also easy since its pure SQL that the client DBA’s can look at and be comfortable about the migration/upgrade of their database.