Passionate about data

Data and its implications on software design and development.

Long Running Data Migrations During Database Refactorings.

When you are refactoring large databases, you will have certain tables that have millions of rows, so lets say we are doing the Move Column refactoring, moving the TaxAmount column from Charge table which has millions of rows to TaxCharge table. Create the TaxAmount column in the TaxCharge table. Then have to move the data from the TaxAmount column in the Charge table to the TaxAmount column you created in the TaxCharge table. One way to move the data would be to write a single update statement as shown below.

UPDATE taxCharge set taxAmount=
(SELECT taxAmount FROM charge
WHERE charge.chargeid = taxCharge.chargeID);

Now this update will run for some time and will be run as one transaction, the Update operation will need a lot of UNDO space (UNDO space in oracle other databases may need other types of space, but the general idea is some space is needed by the database server ), so there is a probability of the update failing if the database server runs out of UNDO space, another side effect of moving large amounts of data using a single update (transaction) is that all the updated rows in the TaxCharge table will be locked till the whole update transaction is done and a commit is issued.

Getting around this problem is to use a programmatic update, like the PL/SQL code shown below. This code can be implemented in the database procedural code like TSQL or any other language appropriate for the task.

CURSOR allCharges IS
    SELECT chargeId, taxAmount FROM charge;
numberOfRowsToCommit NUMBER :=1000;
FOR thisCharge IN allCharges LOOP
    UPDATE taxCharge SET taxamount=thisCharge.taxAmount 
        WHERE chargeid = thisCharge.chargeid;
    IF allCharges%rowcount MOD numberOfRowsToCommit = 0 THEN
    END IF;

The above PL/SQL code basically loops through all the rows in the Charge table and updates the TaxCharge table with the taxAmount and commits after every 1000 rows (as defined by numberOfRowsToCommit). This approach allows us to work with a smaller UNDO size, releases (not lock) rows once they are committed after every 1000 rows. Off course there are many variations of this technique and many ways to implement this particular update. The point I’m trying to get to is that, depending on the situation, I will choose different ways to update the data and maybe not always use UPDATE statements.