The Queries Your ORM Writes and How to Fix Them
The Queries Your ORM Writes and How to Fix Them
Hibernate generates SQL. Developers do not read it. The database executes it. Performance suffers. This section catalogs the five most damaging SQL patterns that JPA/Hibernate generates on the content platform, proves the cost with EXPLAIN ANALYZE, and shows the fix for each.
Enabling SQL Logging
Before fixing queries, you must see them. Configure Hibernate to log generated SQL with parameter values and timing:
# application.yml
spring:
jpa:
properties:
hibernate:
format_sql: true
generate_statistics: true
show-sql: false # Use the logger instead
logging:
level:
org.hibernate.SQL: DEBUG
org.hibernate.type.descriptor.sql.BasicBinder: TRACE
org.hibernate.stat: DEBUG
The generate_statistics flag outputs per-session query counts:
Session Metrics {
842 nanoseconds spent acquiring 1 JDBC connection;
0 nanoseconds spent releasing 0 JDBC connections;
21847291 nanoseconds spent preparing 21 JDBC statements;
4821947 nanoseconds spent executing 21 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
}
21 JDBC statements for a single page load: this is the N+1 signature.
Anti-Pattern 1: N+1 Selects
The main chapter introduced the N+1 problem. This section covers detection, all variants, and the complete fix toolkit.
Detection
The N+1 problem occurs whenever a collection or association is lazily loaded inside a loop. The code looks innocent:
// SLOW: N+1 queries
List<Article> articles = articleRepository.findByStatus("published", PageRequest.of(0, 50));
for (Article article : articles) {
String authorName = article.getAuthor().getName(); // Query per article
List<String> tagNames = article.getTags().stream() // Query per article
.map(Tag::getName).toList();
results.add(new ArticleView(article.getTitle(), authorName, tagNames));
}
This generates 1 + 50 + 50 = 101 queries: 1 for articles, 50 for authors, 50 for tag collections.
The EXPLAIN ANALYZE for the author lookup (executed 50 times):
-- Executed once per article (50 times)
SELECT a.id, a.name, a.email, a.bio, a.avatar_url, a.created_at
FROM authors a
WHERE a.id = ?;
Index Scan using authors_pkey on authors (cost=0.29..8.30 rows=1 width=248)
(actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (id = 7)
Buffers: shared hit=3
Planning Time: 0.042 ms
Execution Time: 0.031 ms
Each query takes 0.031 ms. 50 queries: 1.55 ms of execution time. But planning time is 0.042 ms per query, adding 2.1 ms. And each query is a separate JDBC round-trip. Under load with connection pool contention, the round-trip overhead dominates.
Fix 1: JOIN FETCH
// FAST: Single query with JOIN FETCH
@Query("SELECT a FROM Article a " +
"LEFT JOIN FETCH a.author " +
"LEFT JOIN FETCH a.tags " +
"WHERE a.status = :status")
List<Article> findByStatusWithRelations(
@Param("status") String status, Pageable pageable);
Generated SQL:
SELECT a.id, a.title, a.slug, a.status, a.published_at, a.view_count,
auth.id, auth.name, auth.email, auth.bio, auth.avatar_url,
t.id, t.name
FROM articles a
LEFT JOIN authors auth ON a.author_id = auth.id
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.status = 'published'
ORDER BY a.published_at DESC;
Sort (cost=12847.33..13097.33 rows=100000 width=320)
(actual time=28.112..32.442 rows=150 loops=1)
Sort Key: a.published_at DESC
Sort Method: quicksort Memory: 48kB
Buffers: shared hit=9124
-> Hash Join (cost=418.00..10842.33 rows=100000 width=320)
(actual time=3.112..24.442 rows=150 loops=1)
Hash Cond: (a.author_id = auth.id)
Buffers: shared hit=9124
-> Hash Join (cost=248.00..8924.33 rows=100000 width=120)
...
Planning Time: 0.412 ms
Execution Time: 32.521 ms
One query instead of 101. But there is a problem: JOIN FETCH with Pageable generates incorrect pagination. Hibernate loads all matching rows and paginates in memory, logging:
HHH90003004: firstResult/maxResults specified with collection fetch; applying in memory
This means Hibernate fetches all published articles with their tags (not just the first 50), then discards everything outside the page window. For 487,000 articles, this is catastrophic.
Fix 2: Two-Phase Query
The correct approach for paginated results with JOIN FETCH:
// FAST: Two queries, no N+1, correct pagination
public Page<ArticleView> findArticles(String status, Pageable pageable) {
// Phase 1: Get paginated IDs
Page<Long> idPage = articleRepository.findIdsByStatus(status, pageable);
// Phase 2: Fetch entities with relations for those IDs
List<Article> articles = idPage.hasContent()
? articleRepository.findByIdsWithRelations(idPage.getContent())
: List.of();
List<ArticleView> views = articles.stream()
.map(a -> new ArticleView(a.getTitle(),
a.getAuthor().getName(),
a.getTags().stream().map(Tag::getName).toList()))
.toList();
return new PageImpl<>(views, pageable, idPage.getTotalElements());
}
// Repository methods
@Query("SELECT a.id FROM Article a WHERE a.status = :status ORDER BY a.publishedAt DESC")
Page<Long> findIdsByStatus(@Param("status") String status, Pageable pageable);
@Query("SELECT DISTINCT a FROM Article a " +
"LEFT JOIN FETCH a.author " +
"LEFT JOIN FETCH a.tags " +
"WHERE a.id IN :ids")
List<Article> findByIdsWithRelations(@Param("ids") List<Long> ids);
Phase 1 EXPLAIN ANALYZE:
SELECT a.id FROM articles a
WHERE a.status = 'published'
ORDER BY a.published_at DESC
LIMIT 50 OFFSET 0;
Limit (cost=0.43..8.87 rows=50 width=16)
(actual time=0.041..0.142 rows=50 loops=1)
Buffers: shared hit=6
-> Index Scan Backward using idx_articles_published_at on articles a
(cost=0.43..35124.00 rows=487000 width=16)
(actual time=0.039..0.134 rows=50 loops=1)
Filter: (status = 'published')
Buffers: shared hit=6
Planning Time: 0.089 ms
Execution Time: 0.168 ms
Phase 2 EXPLAIN ANALYZE:
SELECT DISTINCT a.id, a.title, a.slug, ..., auth.id, auth.name, ..., t.id, t.name
FROM articles a
LEFT JOIN authors auth ON a.author_id = auth.id
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.id IN (1042, 1041, 1040, ...); -- 50 IDs
Unique (cost=12.87..18.42 rows=150 width=320)
(actual time=0.221..0.512 rows=150 loops=1)
Buffers: shared hit=214
-> Nested Loop Left Join (cost=1.42..17.87 rows=150 width=320)
(actual time=0.198..0.478 rows=150 loops=1)
Buffers: shared hit=214
-> Nested Loop Left Join (cost=0.85..8.42 rows=50 width=284)
(actual time=0.089..0.212 rows=50 loops=1)
Buffers: shared hit=54
-> Index Scan using articles_pkey on articles a
(cost=0.42..4.21 rows=50 width=120)
(actual time=0.031..0.062 rows=50 loops=1)
Index Cond: (id = ANY('{1042,1041,...}'::bigint[]))
Buffers: shared hit=4
-> Index Scan using authors_pkey on authors auth
(cost=0.29..0.31 rows=1 width=164)
(actual time=0.002..0.002 rows=1 loops=50)
Index Cond: (id = a.author_id)
Buffers: shared hit=50
-> Index Scan using article_tags_pkey on article_tags at
(cost=0.29..0.35 rows=3 width=16)
(actual time=0.003..0.004 rows=3 loops=50)
Index Cond: (article_id = a.id)
Buffers: shared hit=150
Planning Time: 0.312 ms
Execution Time: 0.556 ms
Two queries totaling 0.724 ms instead of 101 queries at 3.5+ ms.
Anti-Pattern 2: SELECT * via Entity Mapping
Every findBy... method on a JPA repository returns full entity objects with all columns:
// SLOW: Fetches all 14 columns including 15KB content
List<Article> articles = articleRepository.findByStatus("published");
// Only uses: title, slug, publishedAt
SELECT a.id, a.title, a.slug, a.content, a.excerpt, a.status,
a.view_count, a.published_at, a.created_at, a.updated_at,
a.author_id, a.category_id, a.featured_image, a.meta_description
FROM articles a WHERE a.status = 'published';
Seq Scan on articles (cost=0.00..19924.00 rows=487000 width=15420)
(actual time=0.012..142.112 rows=487000 loops=1)
Filter: (status = 'published')
Rows Removed by Filter: 13000
Buffers: shared hit=8924, shared read=42124
width=15420: each row is 15 KB, mostly from the content column. The total data transferred: 487,000 x 15 KB = 7.1 GB. shared read=42124 means 42,124 pages read from disk because the full rows do not fit in the buffer pool.
The fix: use interface-based projections:
// FAST: Projection interface selects only needed columns
public interface ArticleSummary {
Long getId();
String getTitle();
String getSlug();
Instant getPublishedAt();
Integer getViewCount();
}
// Repository
List<ArticleSummary> findByStatus(String status);
Generated SQL:
SELECT a.id, a.title, a.slug, a.published_at, a.view_count
FROM articles a WHERE a.status = 'published';
Seq Scan on articles (cost=0.00..19924.00 rows=487000 width=82)
(actual time=0.012..48.112 rows=487000 loops=1)
Filter: (status = 'published')
Rows Removed by Filter: 13000
Buffers: shared hit=8924
width=82 instead of 15,420. All data fits in the buffer pool (shared hit=8924, no shared read). Execution time drops from 142 ms to 48 ms because there is no disk I/O for wide content columns.
Anti-Pattern 3: Cartesian Products from Multiple Collections
Fetching two collection associations in a single query creates a cartesian product:
// SLOW: Two collections = cartesian product
@Query("SELECT a FROM Article a " +
"LEFT JOIN FETCH a.tags " +
"LEFT JOIN FETCH a.comments " +
"WHERE a.id = :id")
Article findByIdWithTagsAndComments(@Param("id") Long id);
If an article has 5 tags and 200 comments, the result set is 5 x 200 = 1,000 rows. Hibernate deduplicates in Java, wasting database bandwidth and memory.
Nested Loop Left Join (cost=1.14..842.33 rows=1000 width=420)
(actual time=0.112..4.221 rows=1000 loops=1)
Buffers: shared hit=842
-> Nested Loop Left Join (cost=0.85..42.33 rows=5 width=320)
...
-> Index Scan using idx_comments_article_id on comments cm
(cost=0.29..152.00 rows=200 width=100)
(actual time=0.012..0.612 rows=200 loops=5)
...
The fix: use @EntityGraph or @BatchSize to load collections in separate queries:
// FAST: Load article with tags, then batch-load comments
@EntityGraph(attributePaths = {"tags"})
@Query("SELECT a FROM Article a WHERE a.id = :id")
Article findByIdWithTags(@Param("id") Long id);
// On the entity: batch-load comments when accessed
@Entity
public class Article {
@OneToMany(mappedBy = "article")
@BatchSize(size = 20)
private List<Comment> comments;
}
Two queries instead of a cartesian join. Query 1 returns 5 rows (article x tags). Query 2 returns 200 rows (comments). Total: 205 rows instead of 1,000.
Anti-Pattern 4: Open Session in View
The Spring default spring.jpa.open-in-view=true keeps the Hibernate Session (and its JDBC connection) open for the entire HTTP request, including view rendering. Lazy-loaded associations trigger queries during JSON serialization:
@GetMapping("/articles/{id}")
public Article getArticle(@PathVariable Long id) {
Article article = articleRepository.findById(id).orElseThrow();
return article; // Jackson serializes all fields, triggering lazy loads
}
During JSON serialization, Jackson calls article.getTags(), article.getAuthor(), article.getComments(), each triggering a separate query. The JDBC connection is held for the full serialization time, which includes I/O to the HTTP client.
The fix:
# application.yml
spring:
jpa:
open-in-view: false # Force all data loading in the service layer
With open-in-view=false, accessing an unloaded lazy association outside a transaction throws LazyInitializationException. This forces developers to explicitly load needed data in the service layer, preventing accidental N+1 queries during serialization.
Anti-Pattern 5: Missing Index Hints in Native Queries
When JPA’s JPQL is insufficient and you write native queries, PostgreSQL may choose a suboptimal plan if statistics are stale:
// Native query without ensuring current statistics
@Query(value = """
SELECT a.id, a.title, a.slug, a.published_at,
ts_rank(a.search_vector, plainto_tsquery(:query)) AS rank
FROM articles a
WHERE a.search_vector @@ plainto_tsquery(:query)
AND a.status = 'published'
ORDER BY rank DESC
LIMIT 20
""", nativeQuery = true)
List<Object[]> searchArticles(@Param("query") String query);
Sort (cost=8924.33..8924.38 rows=20 width=86)
(actual time=142.112..142.118 rows=20 loops=1)
Sort Key: (ts_rank(a.search_vector, plainto_tsquery('java performance'))) DESC
Sort Method: top-N heapsort Memory: 27kB
Buffers: shared hit=4212, shared read=1842
-> Bitmap Heap Scan on articles a (cost=52.18..8842.33 rows=4200 width=86)
(actual time=12.112..138.442 rows=4187 loops=1)
Recheck Cond: (search_vector @@ plainto_tsquery('java performance'))
Filter: (status = 'published')
Rows Removed by Filter: 13
Heap Blocks: exact=3842
Buffers: shared hit=4212, shared read=1842
-> Bitmap Index Scan on idx_articles_search_vector
(cost=0.00..51.13 rows=4200 width=0)
(actual time=8.112..8.112 rows=4200 loops=1)
Buffers: shared hit=42
The query reads 4,200 rows, computes ts_rank for all of them, then sorts for the top 20. The shared read=1842 indicates cache misses. After running ANALYZE articles:
-> Bitmap Heap Scan on articles a (cost=52.18..8842.33 rows=4200 width=86)
(actual time=4.112..18.442 rows=4187 loops=1)
Recheck Cond: (search_vector @@ plainto_tsquery('java performance'))
Filter: (status = 'published')
Heap Blocks: exact=3842
Buffers: shared hit=5854, shared read=200
After statistics update and buffer pool warmup, shared read drops from 1,842 to 200. Schedule ANALYZE on tables with heavy write activity:
-- Auto-analyze threshold: run ANALYZE after 5% of rows change
ALTER TABLE articles SET (autovacuum_analyze_threshold = 1000);
ALTER TABLE articles SET (autovacuum_analyze_scale_factor = 0.05);
Batch Fetching: The Middle Ground
When refactoring all queries is impractical, @BatchSize reduces N+1 to N/batch+1:
@Entity
public class Article {
@ManyToMany
@BatchSize(size = 25)
private Set<Tag> tags;
@ManyToOne(fetch = FetchType.LAZY)
@BatchSize(size = 25)
private Author author;
}
Loading 50 articles with @BatchSize(size = 25) generates:
- 1 query for articles
- 2 queries for authors (50 / 25 = 2 batches)
- 2 queries for tags (50 / 25 = 2 batches)
- Total: 5 queries instead of 101
The batch query uses IN:
SELECT a.id, a.name FROM authors a WHERE a.id IN (?, ?, ?, ..., ?);
-- 25 parameters
Index Scan using authors_pkey on authors (cost=0.29..88.42 rows=25 width=164)
(actual time=0.031..0.112 rows=25 loops=1)
Index Cond: (id = ANY('{7,12,3,...}'::bigint[]))
Buffers: shared hit=28
Planning Time: 0.089 ms
Execution Time: 0.142 ms
Five queries at approximately 0.14 ms each: 0.7 ms total. Not as fast as a single JOIN FETCH (0.56 ms), but requires no query changes. Apply @BatchSize as a project-wide default:
spring:
jpa:
properties:
hibernate:
default_batch_fetch_size: 25
This is the lowest-effort, highest-impact change for an existing codebase with N+1 problems. Then use pg_stat_statements to identify the queries that still need hand-tuning.