Passionate about data

Data and its implications on software design and development.

Using Liquibase to Load Data and Ignore Some Columns

Loading data into tables is needed many times on projects to load test, Liquibase provides a method to load data into tables with lots of customization. In the example shown below, I’m loading zip code data with the following column layout

1
"Zipcode","ZipCodeType","City","State","LocationType","Lat","Long","Location","Decommisioned","TaxReturnsFiled","EstimatedPopulation","TotalWages"

The alphanumeric data is enclosed in and separated by , the first line in the data file is a row of headers for the columns of data. A snapshot of the data in the file is shown below

1
2
3
4
"02108","STANDARD","BOSTON","MA","PRIMARY",42.35,-71.06,"NA-US-MA-BOSTON","false",2348,3312,388783474
"02109","STANDARD","BOSTON","MA","PRIMARY",42.35,-71.06,"NA-US-MA-BOSTON","false",2966,4145,284385612
"02110","STANDARD","BOSTON","MA","PRIMARY",42.35,-71.06,"NA-US-MA-BOSTON","false",2950,4313,231268950
"02111","STANDARD","BOSTON","MA","PRIMARY",42.35,-71.06,"NA-US-MA-BOSTON","false",2964,4467,315024986

The standard dataset for the zipcode file has more than 45k rows. These rows can be loaded using the loaddata command of liquibase. We can ignore some of the columns in the datafile if we are not interested in the data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
 <changeset author="pramod" id="45">
   <loadData
     file="free-zipcode-database-Primary.csv"
     quotChar="&quot;"
     seperator=","
     tableName = "zipcode" >
     <column header="zipcode" name="zipcode" type="string"/>
     <column header="ZipCodeType" type="skip"/>
     <column header="City" name="city" type="string"/>
     <column header="State" name="state_code" type="string"/>
     <column header="LocationType" type="skip"/>
     <column header="Lat" type="skip"/>
     <column header="Long" type="skip"/>
     <column header="Location" type="skip"/>
     <column header="Decommisioned" type="skip"/>
     <column header="TaxReturnsFiled" type="skip"/>
     <column header="EstimatedPopulation" type="skip"/>
     <column header="TotalWages" type="skip"/>
   </loadData>
</changeset>

As seen above we are ignoring some of the data in the datafile using the skip type and we are mapping the header column names to columns the table by specifying the header and name attributes of the column element.

This is a much more efficient method of loading data instead of creating SQL statements or going to external commands such as sql loader or other database specific tools as the loadData command can be wrapped up in the developer tools and pipelines.