In this blog post, I will discuss ID generation techniques using the Object Relation Mapping frameworks such as Hibernate, Toplink, ActiveRecord, Entity Framework. When using hibernate or any other ORM mapping framework. There is a need to generate primary key values for the “id” columns. These values can be generated by using IDENTITY, SEQUENCE or TABLE strategies. Generating custom values for primary keys or other values is a topic for another blog post.
In this blog post the examples are using Hibernate, JPA, Java8 and Oracle12c.
TABLE Id generation strategy:
When using this strategy, a table in the database is used to store the current id value and the next value is generated and written back to the table.
1 2 3 4
The above table is used by hibernate to generate next_val for sequence_name specified in the entity mapping, if no sequence_name is specified then default is used.
The ID generation involves a transaction for its own use and care has to be taken when multiple processes are generating ID’s, and involves locking the ROW in the table. This strategy is the least efficient and should be avoided.
IDENTITY Id generation strategy:
Identity columns in databases like MySQL, MSSQL, PostgreSQL and Oracle 12c are used to auto generate numeric values for a number column. There can be only one identity column in a table, the column has to be not-null and cannot accept any other default values. Identity columns are generally used as synthetic primary key columns. Shown below is an example of creating a table with identity column id in Oracle12c.
1 2 3 4 5 6 7 8 9
The id column from the person table can now be mapped to the Person entity. The generation type we are using is IDENTITY and hibernate now understands that the database will provide the value for id at insert time.
1 2 3
Mapping in Hibernate is simple, without having to specify any sequence names, All inserts into the table automatically use the same method for getting ID’s either through the application or directly into the database by other users.
Since ID’s are assigned at insert time, JDBC batching does not work when using identity columns. In Oracle12c dropping of tables does not drop the sequence associated with the identity column leaving stale sequences in the database. In a Parent and Child relationship table such as Person and all their Addresses table, the child.parent_id column cannot be “not null” as hibernate (the ORM framework) cannot get the ID of the parent before the parent is inserted, hence it does these operations
1 2 3 4 5
Identity ID generation does not support preallocation, so requires a select after each insert, increasing the trips to the database and degrading insert performance.
SEQUENCE Id generation strategy:
Sequences in Oracle, Postgres and SQLServer 2012 are database objects that generate unique numbers and are generated independent of transactions. Sequences improve the concurrency of number generation. Sequences can be increment by any number, can start at any number, can cycle the numbers after maximum number is reached, have upper limit on the number generated and can cache certain number of values. Shown below is an example of creating sequence in Oracle12c.
1 2 3 4 5 6 7 8 9 10
Using the sequence above and the table definition, we can map the table with the object and use the SEQUENCE GenerationType and the sequence name seq_person. When a new Person is created, hibernate will get the next_id from the sequence and assign it to id attribute on the object.
1 2 3 4
Sequences provide a easy way to create an object to provide unique numbers and are not bound by transaction boundaries. Various configuration options, such as start number, increment interval, cache size, max number, min number, cycle the numbers etc.
ORM frameworks can use batch operations for inserts as they can get the sequence values for ObjectID’s without inserting the data in the tables.
Applications have to remember to use the correct sequence to insert the data in the table, hibernate mapping is somewhat verbose. Other utilities have to remember to use the correct sequence when inserting data directly into the database.
Based on the above observations and usage:
If the database provides features to use sequences they should be used. In databases that do not provide sequences identity columns have to be used.
If the database provides both identity and sequence features. The decision depends on the amount of batch operations being performed and the need to provide consistent primary key generation for new data from the application and other utilities loading data directly in the database.