Skip to main content
the lies your orm tells you

Bulk Operations: Where Hibernate Gives Up

6 min read Chapter 10 of 30

Bulk Operations: Where Hibernate Gives Up

The Lie

Hibernate manages your data. Persist your entities, and Hibernate handles the rest. The abstraction holds whether you are saving one entity or one million.

The Reality

Hibernate was designed for transactional OLTP workloads: load a few entities, modify them, flush changes. When you need to insert 100,000 rows, update 500,000 rows, or delete everything older than 90 days, Hibernate’s entity lifecycle model becomes the bottleneck.

Every call to persist() adds the entity to the persistence context (a HashMap keyed by entity type and ID). Every flush iterates the entire persistence context, dirty-checks every managed entity, and generates SQL for the dirty ones. With 100,000 entities in the context, dirty-checking alone takes hundreds of milliseconds per flush.

Bulk Insert: Entity-at-a-Time vs Batched

The left column shows what happens when you call persist() in a loop without flush/clear: 100,000 round trips, a 200 MB persistence context, and a 45-90 second wall clock time. The right column shows the same import with periodic flush/clear at a batch size of 50: 2,000 round trips (each a JDBC batch), a 2 MB peak persistence context, and a 3-8 second wall clock time. The fix is 10 lines of code.

The Evidence

// BAD: Naive bulk insert
@Transactional
public void importProducts(List<ProductDTO> dtos) {
    for (ProductDTO dto : dtos) {
        Product product = new Product();
        product.setName(dto.name());
        product.setPrice(dto.price());
        product.setCategory(dto.category());
        entityManager.persist(product);
    }
    // At flush time: 100,000 entities in the persistence context
    // Hibernate dirty-checks ALL of them
    // Then generates 100,000 individual INSERT statements
}

// Generated SQL (100,000 times):
// insert into products (name, price, category) values (?, ?, ?)
// Each INSERT is a separate JDBC roundtrip

Execution time for 100,000 rows on PostgreSQL with local connection: 45-90 seconds. The time breaks down roughly as:

  • Persistence context management: ~15%
  • Dirty checking: ~25%
  • SQL generation: ~10%
  • Individual JDBC round trips: ~50%

The Fix

1. Enable JDBC Batching

spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 50
        order_inserts: true
        order_updates: true
// BETTER: Batch insert with periodic flush and clear
@Transactional
public void importProducts(List<ProductDTO> dtos) {
    int batchSize = 50;
    for (int i = 0; i < dtos.size(); i++) {
        ProductDTO dto = dtos.get(i);
        Product product = new Product();
        product.setName(dto.name());
        product.setPrice(dto.price());
        product.setCategory(dto.category());
        entityManager.persist(product);

        if (i > 0 && i % batchSize == 0) {
            entityManager.flush();
            entityManager.clear();
        }
    }
}

// Generated SQL (batched):
// insert into products (name, price, category) values (?, ?, ?)
// ... batched in groups of 50, sent as a single JDBC batch call

Execution time drops to 8-15 seconds. The flush/clear cycle keeps the persistence context small, and JDBC batching sends 50 INSERTs per round trip instead of 1.

Critical caveat: JDBC batching does not work with GenerationType.IDENTITY. When you use IDENTITY columns, the database generates the ID on INSERT, and Hibernate must execute each INSERT individually to read back the generated ID. Switch to GenerationType.SEQUENCE with a sequence allocation size:

@Entity
public class Product {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "product_seq")
    @SequenceGenerator(name = "product_seq", sequenceName = "product_seq", allocationSize = 50)
    private Long id;
    // ...
}

With SEQUENCE and allocationSize = 50, Hibernate fetches 50 IDs per sequence call, then assigns them in memory without round-tripping to the database for each INSERT.

2. Bulk DML for Updates and Deletes

// BAD: Loading entities to update them
@Transactional
public void applyDiscount(String category, BigDecimal discount) {
    List<Product> products = productRepository.findByCategory(category);
    for (Product product : products) {
        product.setPrice(product.getPrice().multiply(BigDecimal.ONE.subtract(discount)));
    }
    // Loads 50,000 entities into memory, dirty-checks all, generates 50,000 UPDATEs
}

// BETTER: Bulk UPDATE via JPQL
@Transactional
@Modifying
@Query("UPDATE Product p SET p.price = p.price * (1 - :discount) WHERE p.category = :category")
int applyDiscount(@Param("category") String category, @Param("discount") BigDecimal discount);

// Generated SQL:
// update products set price = price * (1 - ?) where category = ?
// One statement. Zero entities loaded. Zero dirty checks.

Bulk DML bypasses the persistence context entirely. The update executes at the database level. Hibernate does not track which entities were modified. This means:

  • The persistence context may contain stale entities
  • The L2 cache is not updated (Hibernate 6 invalidates the entire entity region, but does not update individual entries)
  • Any managed entity of the affected type should be considered stale after a bulk DML
// BETTER: Clear persistence context after bulk DML
@Transactional
public void applyDiscountSafely(String category, BigDecimal discount) {
    entityManager.createQuery(
        "UPDATE Product p SET p.price = p.price * (1 - :discount) WHERE p.category = :category")
        .setParameter("category", category)
        .setParameter("discount", discount)
        .executeUpdate();
    entityManager.clear(); // Evict all managed entities
}

3. StatelessSession for Maximum Throughput

// BETTER: StatelessSession for bulk operations
@Transactional
public void importProductsStateless(List<ProductDTO> dtos) {
    Session session = entityManager.unwrap(Session.class);
    StatelessSession statelessSession = session.getSessionFactory().openStatelessSession();
    Transaction tx = statelessSession.beginTransaction();

    try {
        for (ProductDTO dto : dtos) {
            Product product = new Product();
            product.setName(dto.name());
            product.setPrice(dto.price());
            product.setCategory(dto.category());
            statelessSession.insert(product);
        }
        tx.commit();
    } catch (Exception e) {
        tx.rollback();
        throw e;
    } finally {
        statelessSession.close();
    }
}

StatelessSession has no persistence context, no dirty checking, no L1 cache, no L2 cache interaction, no cascade, no lazy loading. It maps almost directly to JDBC. Execution time for 100,000 rows: 3-6 seconds with JDBC batching enabled.

The Cost Model

Approach100K insertsMemoryPersistence Context
Naive persist() loop45-90s~500MB (100K entities in context)Full, dirty-checked
Batch with flush/clear8-15s~25MB (batch-size entities)Periodically cleared
StatelessSession3-6s~5MB (no context)None
Raw JDBC2-4s~2MBNone

The gap between StatelessSession and raw JDBC is small enough that the convenience of Hibernate’s type mapping and SQL generation is worth keeping. Below 10,000 entities, the naive approach is slow but survivable. Above 100,000, batching or StatelessSession is mandatory.

Hibernate Bulk Insert Throughput

The diagram compares insert throughput across the four approaches at different data volumes. The key takeaway: the persistence context is the bottleneck. Every optimization removes persistence context overhead, and StatelessSession eliminates it entirely. The performance curve diverges dramatically above 10,000 entities.