Transaction Isolation, Lock Costs, and Idle Transactions
Transaction Isolation, Lock Costs, and Idle Transactions
The previous section addressed connection overhead: how many connections and how to pool them. This section addresses what happens inside those connections: the cost of transaction isolation guarantees, the impact of lock contention, and the damage caused by transactions that acquire resources and then go idle.
Transaction Isolation Levels in PostgreSQL
PostgreSQL implements three isolation levels (SERIALIZABLE, REPEATABLE READ, READ COMMITTED). The SQL standard’s READ UNCOMMITTED maps to READ COMMITTED in PostgreSQL (dirty reads are never allowed).
-- Set isolation level
BEGIN ISOLATION LEVEL READ COMMITTED; -- default
BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN ISOLATION LEVEL SERIALIZABLE;
Each level has a different overhead profile:
READ COMMITTED (Default)
Each statement sees a fresh snapshot. The snapshot is taken at statement start, not transaction start.
BEGIN;
SELECT count(*) FROM articles WHERE category_id = 5; -- snapshot at this moment
-- Another session inserts 10 articles in category 5
SELECT count(*) FROM articles WHERE category_id = 5; -- sees new articles!
COMMIT;
Cost: snapshot acquisition per statement. This involves reading the ProcArray (list of in-progress transactions) under a lightweight lock.
REPEATABLE READ
Snapshot taken at first statement of the transaction. All subsequent statements see the same snapshot.
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM articles WHERE category_id = 5; -- snapshot frozen here
-- Another session inserts 10 articles in category 5
SELECT count(*) FROM articles WHERE category_id = 5; -- same count as before
COMMIT;
Cost: same snapshot acquisition, but only once. However, write conflicts raise serialization errors:
-- Session 1
BEGIN ISOLATION LEVEL REPEATABLE READ;
UPDATE articles SET view_count = view_count + 1 WHERE id = 42;
-- Session 2 (concurrent)
BEGIN ISOLATION LEVEL REPEATABLE READ;
UPDATE articles SET view_count = view_count + 1 WHERE id = 42;
-- ERROR: could not serialize access due to concurrent update
The application must retry failed transactions:
// FAST: retry loop for REPEATABLE READ conflicts
public void incrementViewCount(DataSource ds, long articleId) throws SQLException {
int maxRetries = 3;
for (int attempt = 0; attempt < maxRetries; attempt++) {
try (Connection conn = ds.getConnection()) {
conn.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);
conn.setAutoCommit(false);
try (PreparedStatement ps = conn.prepareStatement(
"UPDATE articles SET view_count = view_count + 1 WHERE id = ?")) {
ps.setLong(1, articleId);
ps.executeUpdate();
conn.commit();
return;
} catch (SQLException e) {
conn.rollback();
if (e.getSQLState().equals("40001") && attempt < maxRetries - 1) {
continue; // serialization failure, retry
}
throw e;
}
}
}
}
SERIALIZABLE
Full serializable isolation using predicate locks (SIRead locks). Detects read-write conflicts that would produce non-serializable behavior.
Cost: maintains predicate lock bookkeeping (memory and CPU overhead), checks for dangerous structures in the dependency graph.
Isolation Level Overhead Benchmark
Benchmark: 10,000 read-only transactions (SELECT single article by ID):
-- Read-only workload: isolation level comparison
-- pgbench custom script: select_article.sql
\set article_id random(1, 1000000)
BEGIN ISOLATION LEVEL :isolation;
SET TRANSACTION READ ONLY;
SELECT id, title, category_id, published_at FROM articles WHERE id = :article_id;
COMMIT;
Results (single thread, 10,000 iterations):
Isolation Level | Avg Latency | P99 Latency | Throughput (tx/sec)
READ COMMITTED | 0.42 ms | 1.1 ms | 2,381
REPEATABLE READ | 0.44 ms | 1.2 ms | 2,273
SERIALIZABLE | 0.51 ms | 1.8 ms | 1,961
For read-only transactions, the overhead is minimal (4.7% for REPEATABLE READ, 21% for SERIALIZABLE). The SERIALIZABLE overhead comes from SIRead lock acquisition and bookkeeping.
Under contention (50 concurrent threads, mix of reads and writes):
Isolation Level | Avg Latency | P99 Latency | Throughput | Serialization Errors
READ COMMITTED | 4.8 ms | 18.2 ms | 10,417 | 0
REPEATABLE READ | 5.1 ms | 22.7 ms | 9,804 | 847
SERIALIZABLE | 6.4 ms | 34.1 ms | 7,813 | 2,184
REPEATABLE READ loses 6% throughput vs READ COMMITTED, plus 847 transactions must be retried. SERIALIZABLE loses 25% throughput and requires 2,184 retries. The retry overhead compounds because each retry adds load.
Row Lock Mechanics
When a transaction updates a row, it acquires an exclusive row lock:
-- Session 1
BEGIN;
UPDATE articles SET view_count = view_count + 1 WHERE id = 42;
-- Row 42 is now locked. Transaction holds lock until COMMIT/ROLLBACK.
-- Session 2 (concurrent)
BEGIN;
UPDATE articles SET view_count = view_count + 1 WHERE id = 42;
-- BLOCKS here, waiting for Session 1's lock
-- pg_stat_activity shows:
-- Session 2: wait_event_type = Lock, wait_event = transactionid
Measuring lock wait time:
-- Enable lock timing
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';
SELECT pg_reload_conf();
-- Query to find lock waits in real-time
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
now() - blocked.state_change AS wait_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND bl.granted = false
JOIN pg_locks gl ON gl.locktype = bl.locktype
AND gl.database IS NOT DISTINCT FROM bl.database
AND gl.relation IS NOT DISTINCT FROM bl.relation
AND gl.page IS NOT DISTINCT FROM bl.page
AND gl.tuple IS NOT DISTINCT FROM bl.tuple
AND gl.pid != bl.pid
AND gl.granted = true
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.state = 'active';
Lock Contention Benchmark: View Count Updates
The content platform’s view_count updates are a hot contention point. Popular articles receive hundreds of concurrent view increments:
-- Hot row: article_id = 42 gets 100 concurrent updates
-- Benchmark: 100 threads each incrementing view_count 100 times
// SLOW: direct row lock contention
public void incrementDirect(DataSource ds, long articleId) throws SQLException {
try (Connection conn = ds.getConnection()) {
conn.setAutoCommit(true);
try (PreparedStatement ps = conn.prepareStatement(
"UPDATE articles SET view_count = view_count + 1 WHERE id = ?")) {
ps.setLong(1, articleId);
ps.executeUpdate();
}
}
}
100 threads, 100 updates each to the same row:
Direct UPDATE contention:
Total time: 8.4 seconds
Avg lock wait: 7.2 ms
P99 lock wait: 42 ms
Throughput: 1,190 updates/sec
Effective serialization: threads queue behind each other
The fix: batch view counts in application memory and flush periodically:
// FAST: batch view counts, flush every 5 seconds
public class ViewCountBatcher {
private final ConcurrentHashMap<Long, LongAdder> pending = new ConcurrentHashMap<>();
private final DataSource dataSource;
public void increment(long articleId) {
pending.computeIfAbsent(articleId, k -> new LongAdder()).increment();
}
@Scheduled(fixedRate = 5000)
public void flush() throws SQLException {
var snapshot = new HashMap<Long, Long>();
pending.forEach((id, adder) -> {
long count = adder.sumThenReset();
if (count > 0) snapshot.put(id, count);
});
if (snapshot.isEmpty()) return;
try (Connection conn = dataSource.getConnection()) {
conn.setAutoCommit(false);
try (PreparedStatement ps = conn.prepareStatement(
"UPDATE articles SET view_count = view_count + ? WHERE id = ?")) {
for (var entry : snapshot.entrySet()) {
ps.setLong(1, entry.getValue());
ps.setLong(2, entry.getKey());
ps.addBatch();
}
ps.executeBatch();
conn.commit();
}
}
}
}
Batched UPDATE (flush every 5s):
Total time: 0.4 seconds (for same 10,000 increments)
Lock contention: 0 (single thread flushes)
Database UPDATEs: 1 per article per flush interval
Throughput: 25,000 logical updates/sec
The batch approach transforms 10,000 contentious row-level updates into 1 uncontested UPDATE per article. Lock hold time drops from 7.2ms average to effectively zero.
Advisory Locks for Application Coordination
When you need mutual exclusion without row locks (coordinating processes, preventing duplicate work):
// FAST: advisory lock for article ingestion deduplication
public boolean tryIngestArticle(Connection conn, String externalId) throws SQLException {
// Hash the external_id to a bigint for advisory lock
long lockId = externalId.hashCode() & 0xFFFFFFFFL;
// Try to acquire advisory lock (non-blocking)
try (PreparedStatement ps = conn.prepareStatement(
"SELECT pg_try_advisory_xact_lock(?)")) {
ps.setLong(1, lockId);
try (ResultSet rs = ps.executeQuery()) {
rs.next();
if (!rs.getBoolean(1)) {
return false; // another process is ingesting this article
}
}
}
// Lock acquired, safe to ingest
// Check if article exists, insert if not
try (PreparedStatement ps = conn.prepareStatement("""
INSERT INTO articles (external_id, title, body, category_id, source_id,
published_at, word_count)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT (external_id) DO NOTHING
""")) {
// ... bind parameters ...
return ps.executeUpdate() > 0;
}
}
Advisory lock advantages over row locks:
- No table access required (pure in-memory lock)
- No dead tuple creation (no UPDATE for lock acquisition)
- Can lock on arbitrary values (not just existing rows)
- Transaction-scoped (
pg_advisory_xact_lock) or session-scoped (pg_advisory_lock)
Advisory lock overhead: approximately 2 microseconds per acquire/release (in-memory hash table lookup).
Idle-in-Transaction: Quantifying the Damage
An idle-in-transaction session holds resources open:
-- Simulate idle-in-transaction
BEGIN;
SELECT * FROM articles WHERE id = 42; -- takes snapshot
-- Application does HTTP call, takes 30 seconds...
-- Transaction is idle but snapshot is held
Impact 1: Vacuum Cannot Clean Dead Tuples
-- Session A: idle in transaction since 5 minutes ago
-- Its snapshot prevents vacuum from removing tuples deleted after its snapshot
-- Check oldest transaction
SELECT
pid,
now() - xact_start AS xact_duration,
state,
query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_start;
pid | xact_duration | state | query
------+---------------+--------------------+----------------------------------
1847 | 00:05:23 | idle in transaction | SELECT * FROM articles WHERE...
While this session exists, any dead tuples created after its snapshot started cannot be vacuumed. If the content platform updates view_counts 1000 times/sec, that is 300,000 dead tuples accumulated during those 5 minutes.
Impact 2: Lock Holding
If the idle transaction performed any writes:
BEGIN;
UPDATE articles SET view_count = 100 WHERE id = 42;
-- Application hangs, transaction idle...
-- Row 42 remains locked. No other session can update it.
Impact 3: Connection Slot Consumption
With PgBouncer in transaction mode, an idle-in-transaction session holds a server connection. If default_pool_size = 20 and 3 connections are idle-in-transaction, only 17 connections serve actual queries.
Measuring Vacuum Blockage
-- Check vacuum progress blockage
SELECT
relname,
n_dead_tup,
last_vacuum,
last_autovacuum,
(SELECT min(backend_xmin)
FROM pg_stat_activity
WHERE state != 'idle') AS oldest_xmin
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
relname | n_dead_tup | last_autovacuum | oldest_xmin
--------------+------------+---------------------+-------------
view_events | 847291 | 2026-05-28 14:23:01 | 184729100
articles | 12847 | 2026-05-29 08:47:12 | 184729100
847,291 dead tuples in view_events because the oldest active transaction (xmin = 184729100) prevents vacuum from cleaning anything newer.
Prevention: Timeout Configuration
-- Kill transactions idle longer than 30 seconds
ALTER SYSTEM SET idle_in_transaction_session_timeout = '30s';
-- Statement timeout prevents individual queries from running too long
ALTER SYSTEM SET statement_timeout = '30s';
-- Lock wait timeout: fail fast rather than queue
ALTER SYSTEM SET lock_timeout = '5s';
SELECT pg_reload_conf();
// Application-side defense
public Connection getConnection(DataSource pool) throws SQLException {
Connection conn = pool.getConnection();
try (Statement stmt = conn.createStatement()) {
stmt.execute("SET statement_timeout = '10s'");
stmt.execute("SET lock_timeout = '3s'");
}
return conn;
}
With lock_timeout = 3s, a query waiting for a row lock fails after 3 seconds instead of blocking indefinitely. The application can retry or return an error to the user. This prevents cascading failures where one slow transaction causes all subsequent transactions to queue.
The Content Platform Transaction Strategy
public class TransactionStrategy {
// Content serving: read-only, READ COMMITTED, short timeout
public <T> T readContent(DataSource readPool, Function<Connection, T> query) {
try (Connection conn = readPool.getConnection()) {
conn.setReadOnly(true);
conn.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
try (Statement stmt = conn.createStatement()) {
stmt.execute("SET LOCAL statement_timeout = '5s'");
}
return query.apply(conn);
} catch (SQLException e) {
throw new DataAccessException(e);
}
}
// Analytics writes: asynchronous commit, batch, READ COMMITTED
public void writeAnalytics(DataSource writePool, Consumer<Connection> batch) {
try (Connection conn = writePool.getConnection()) {
conn.setAutoCommit(false);
try (Statement stmt = conn.createStatement()) {
stmt.execute("SET LOCAL synchronous_commit = off");
stmt.execute("SET LOCAL statement_timeout = '30s'");
}
batch.accept(conn);
conn.commit();
} catch (SQLException e) {
throw new DataAccessException(e);
}
}
// Article ingestion: standard commit, COPY protocol, longer timeout
public void ingestArticles(DataSource writePool, List<Article> articles) {
try (Connection conn = writePool.getConnection()) {
conn.setAutoCommit(false);
try (Statement stmt = conn.createStatement()) {
stmt.execute("SET LOCAL statement_timeout = '120s'");
stmt.execute("SET LOCAL lock_timeout = '10s'");
}
copyArticles(conn, articles);
conn.commit();
} catch (SQLException e) {
throw new DataAccessException(e);
}
}
}
Each transaction type has appropriate isolation, timeout, and commit semantics:
- Reads: fastest possible, fail-fast on any delay
- Analytics: tolerates async commit (data is reconstructable)
- Ingestion: needs longer timeouts for bulk operations, standard commit for durability