The IDENTITY Generation Problem and Sequence Strategies
The IDENTITY Generation Problem and Sequence Strategies
Every Hibernate batching guide mentions “switch from IDENTITY to SEQUENCE.” Few explain why IDENTITY breaks batching at the JDBC level.
The Lie
GenerationType.IDENTITY is the simplest ID strategy. Let the database auto-increment. Hibernate handles the rest.
The Reality
When Hibernate calls persist() on an entity with IDENTITY generation, it must execute the INSERT immediately, not defer it until flush time. The reason: Hibernate needs the generated ID to store the entity in the persistence context (which is a map keyed by entity type + ID). Without the ID, Hibernate cannot manage the entity.
This means every persist() call triggers an immediate INSERT INTO ... RETURNING id (or Statement.RETURN_GENERATED_KEYS). JDBC batching requires deferring statement execution and sending multiple statements in a single network call. If every INSERT executes immediately, there is nothing to batch.
// BAD: IDENTITY disables batching
@Entity
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
private Instant createdAt;
}
// Even with hibernate.jdbc.batch_size=50, each persist() executes immediately:
for (int i = 0; i < 1000; i++) {
Event event = new Event();
event.setName("Event " + i);
event.setCreatedAt(Instant.now());
entityManager.persist(event);
// INSERT executed immediately, 1000 round trips
}
The Fix
Use GenerationType.SEQUENCE with an appropriate allocationSize.
// BETTER: SEQUENCE with allocation
@Entity
public class Event {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "event_seq")
@SequenceGenerator(name = "event_seq", sequenceName = "event_seq", allocationSize = 50)
private Long id;
private String name;
private Instant createdAt;
}
With allocationSize = 50, Hibernate calls SELECT nextval('event_seq') once and reserves 50 IDs. The next 49 persist() calls assign IDs from the reserved range without touching the database. This allows the INSERTs to be deferred until flush time and batched.
The hi-lo algorithm:
- Call
nextval('event_seq'), returns 3 - Hibernate calculates range:
(3 - 1) * 50 + 1 = 101to3 * 50 = 150 - Next 50 persists use IDs 101-150 without a sequence call
- On the 51st persist, call
nextvalagain
Warning: if your sequence INCREMENT BY does not match allocationSize, IDs will have gaps or collisions. The database sequence must be created with INCREMENT BY 50:
CREATE SEQUENCE event_seq START WITH 1 INCREMENT BY 50;
If you are on MySQL (no sequences), Hibernate 6 emulates sequences using a table. This works but adds a row-level lock per allocation call. PostgreSQL sequences are lock-free and the better choice for high-throughput inserts.
Migration from IDENTITY to SEQUENCE
On a live system with existing IDENTITY-generated IDs:
-- Step 1: Create the sequence, starting above the current max ID
SELECT MAX(id) FROM events; -- Returns 847293
CREATE SEQUENCE event_seq START WITH 847350 INCREMENT BY 50;
-- Start above max + allocationSize to avoid collisions
-- Step 2: Drop the IDENTITY/auto-increment constraint
-- PostgreSQL:
ALTER TABLE events ALTER COLUMN id DROP DEFAULT;
ALTER TABLE events ALTER COLUMN id DROP IDENTITY IF EXISTS;
-- Step 3: Deploy the application code with @SequenceGenerator
-- The next persist() call will use event_seq
This migration requires a brief window where no inserts happen on the table, or a coordinated deployment where old code stops inserting before new code starts. The ID column remains a regular integer; only the generation strategy changes.
The Cost Model
| Strategy | Sequence calls per 1000 inserts | Batchable | Round trips (batch_size=50) |
|---|---|---|---|
| IDENTITY | 0 (database generates) | No | 1000 |
| SEQUENCE(allocationSize=1) | 1000 | Yes | 20 + 1000 = 1020 |
| SEQUENCE(allocationSize=50) | 20 | Yes | 20 + 20 = 40 |
| SEQUENCE(allocationSize=100) | 10 | Yes | 10 + 20 = 30 |
The difference between IDENTITY (1000 round trips) and SEQUENCE with allocationSize=50 (40 round trips) is 25x fewer network calls. On a remote database with 5ms latency, that is 5 seconds vs 200ms.
Set allocationSize to match or exceed your batch_size. If batch_size is 50 and allocationSize is 10, Hibernate must call nextval 5 times per batch, adding 5 round trips per 50 inserts that could have been 1.