Skip to main content
the invisible-layer how abstraction is making software engineers dumber

Transactions: The Hardest Abstraction in Computing

10 min read Chapter 22 of 56
Summary

Explains PostgreSQL's MVCC implementation through transaction IDs, xmin/xmax...

Explains PostgreSQL's MVCC implementation through transaction IDs, xmin/xmax visibility rules, and WAL mechanics. Demonstrates isolation level anomalies with concurrent SQL sessions, shows deadlock detection in action, and exposes how ORMs and connection poolers like PgBouncer break transaction guarantees.

Transactions: The Hardest Abstraction in Computing

When you call COMMIT, what physically happens? Not conceptually. Physically. Which bytes move to which location on which storage device, and in what order?

Here is the sequence:

  1. PostgreSQL writes the transaction’s changes to the Write-Ahead Log (WAL) in shared memory buffers.
  2. At COMMIT, the WAL buffer is flushed to the WAL file on disk using fsync (or fdatasync, depending on the wal_sync_method setting).
  3. The kernel’s filesystem cache is forced to physical storage. The disk’s write cache must also persist the data — if it doesn’t, fsync lies, and your durability guarantee is fake.
  4. Only after the WAL is durable on disk does PostgreSQL return success to the client.
  5. The actual table data pages (the “heap”) are modified in shared buffers and written to disk later, asynchronously, by the background writer or checkpointer.

This is write-ahead logging: the log hits disk before the data does. If the server crashes after step 4 but before step 5, PostgreSQL replays the WAL on startup and reconstructs the data pages. Your committed data survives. If the server crashes before step 4, the transaction is lost — and that’s correct, because COMMIT never returned success.

The fsync at step 2 is the single most expensive operation in transaction processing. It’s why committing 10,000 rows one-at-a-time (10,000 commits, 10,000 fsyncs) is catastrophically slower than committing them in a single transaction (one commit, one fsync). The data writes are similar; the synchronization cost is 10,000x different.

MVCC: How PostgreSQL Keeps Everyone’s Story Straight

Relational databases face a fundamental concurrency problem: multiple transactions reading and writing the same rows simultaneously. Locking every row you read (the approach older databases like MySQL with MyISAM took) creates contention — readers block writers, writers block readers.

PostgreSQL’s solution is Multi-Version Concurrency Control (MVCC). Instead of locks, it keeps multiple versions of each row alive simultaneously. Each transaction sees a consistent snapshot of the database as of a specific point in time, even while other transactions are modifying that data.

Every row in PostgreSQL has hidden system columns:

  • xmin: The transaction ID that created this row version.
  • xmax: The transaction ID that deleted (or updated) this row version. Zero if the row is still live.

You can see them:

SELECT xmin, xmax, id, name, balance
FROM accounts
WHERE id = 1;
  xmin  | xmax | id |  name   | balance
--------+------+----+---------+---------
 100742 |    0 |  1 | Alice   |  1000.00

Transaction 100742 created this row. xmax = 0 means no transaction has deleted or updated it yet.

Now, when transaction 100800 updates Alice’s balance:

-- Transaction 100800
UPDATE accounts SET balance = 900 WHERE id = 1;

PostgreSQL does NOT overwrite the existing row. It:

  1. Sets xmax = 100800 on the old row version (marking it as “deleted by transaction 100800”)
  2. Creates a new row version with xmin = 100800, xmax = 0, and balance = 900

Both versions exist simultaneously on disk. Which version a concurrent transaction sees depends on its snapshot — the set of transaction IDs that were committed at the moment the snapshot was taken.

A transaction with a snapshot from before 100800 committed sees: xmin = 100742 (committed, visible), xmax = 100800 (not yet committed, so the delete is invisible). It reads the old balance: 1000.

A transaction with a snapshot from after 100800 committed sees: the old row has xmax = 100800 (committed, so the row is deleted). The new row has xmin = 100800 (committed, visible). It reads the new balance: 900.

This is snapshot isolation. Readers never block writers. Writers never block readers. The cost: dead row versions accumulate on disk until VACUUM cleans them up. This is why PostgreSQL has autovacuum, and why disabling it is a time bomb.

Isolation Levels: What You See and When

PostgreSQL supports four isolation levels, but internally implements three distinct behaviors.

READ COMMITTED (Default)

Each statement within a transaction gets a fresh snapshot. You see data committed by other transactions between your own statements:

-- Session A                              -- Session B
BEGIN;
                                          BEGIN;
SELECT sum(balance) FROM accounts;
-- Returns: 10000
                                          UPDATE accounts SET balance = balance + 500
                                            WHERE id = 1;
                                          COMMIT;
SELECT sum(balance) FROM accounts;
-- Returns: 10500 (sees Session B's update)
COMMIT;

The same SELECT sum(balance) returns different values within Session A’s transaction. For reporting queries, this produces internally inconsistent results. Your report might show totals that never existed at any single point in time.

REPEATABLE READ

The transaction gets one snapshot at the start. All statements see the same data, regardless of what other transactions commit:

-- Session A                              -- Session B
BEGIN ISOLATION LEVEL REPEATABLE READ;
                                          BEGIN;
SELECT sum(balance) FROM accounts;
-- Returns: 10000
                                          UPDATE accounts SET balance = balance + 500
                                            WHERE id = 1;
                                          COMMIT;
SELECT sum(balance) FROM accounts;
-- Still returns: 10000 (snapshot frozen)
COMMIT;

But REPEATABLE READ has a catch. If Session A tries to update a row that Session B already modified and committed:

-- Session A (continuing)
UPDATE accounts SET balance = balance - 200 WHERE id = 1;
-- ERROR: could not serialize access due to concurrent update

PostgreSQL aborts Session A rather than applying the update on top of a version that didn’t exist in Session A’s snapshot. Your application must catch this and retry.

SERIALIZABLE

The strongest guarantee. PostgreSQL ensures that the result of concurrent transactions is equivalent to some serial (one-at-a-time) execution order. This catches anomalies that REPEATABLE READ misses:

-- Session A                              -- Session B
BEGIN ISOLATION LEVEL SERIALIZABLE;       BEGIN ISOLATION LEVEL SERIALIZABLE;

SELECT count(*) FROM tickets
  WHERE event_id = 42 AND status = 'sold';
-- Returns: 98
                                          SELECT count(*) FROM tickets
                                            WHERE event_id = 42
                                            AND status = 'sold';
                                          -- Returns: 98

-- Both sessions see 98 sold (capacity: 100)
-- Both decide there's room for 1 more

INSERT INTO tickets (event_id, status)
  VALUES (42, 'sold');
COMMIT;
-- Succeeds
                                          INSERT INTO tickets (event_id, status)
                                            VALUES (42, 'sold');
                                          COMMIT;
                                          -- ERROR: could not serialize access
                                          -- due to read/write dependencies

Under REPEATABLE READ, both inserts would succeed, and you’d oversell the event (100 tickets sold when capacity is 100, but now 100 from before + 2 new ones). SERIALIZABLE detects that both transactions read the same data and made conflicting decisions based on it.

The cost: serialization failures are more frequent, and your application must handle retries for every transaction. The monitoring overhead is also higher — PostgreSQL tracks predicate locks to detect these dependency cycles.

Deadlocks: When Transactions Trap Each Other

-- Session A                              -- Session B
BEGIN;                                    BEGIN;
UPDATE accounts SET balance = 100
  WHERE id = 1;
-- Acquires row lock on id=1
                                          UPDATE accounts SET balance = 200
                                            WHERE id = 2;
                                          -- Acquires row lock on id=2

UPDATE accounts SET balance = 300
  WHERE id = 2;
-- WAITS (Session B holds lock on id=2)
                                          UPDATE accounts SET balance = 400
                                            WHERE id = 1;
                                          -- WAITS (Session A holds lock on id=1)
                                          -- DEADLOCK DETECTED!

Both sessions are waiting for each other. Neither can proceed. PostgreSQL runs a deadlock detector (by default every deadlock_timeout, which is 1 second). It constructs a wait-for graph, finds the cycle, and kills one transaction:

ERROR:  deadlock detected
DETAIL:  Process 12345 waits for ShareLock on transaction 100850;
         blocked by process 12346.
         Process 12346 waits for ShareLock on transaction 100849;
         blocked by process 12345.
HINT:  See server log for query details.

The fix is consistent lock ordering: always acquire locks on rows in the same order (e.g., by ascending id). If every transaction locks id=1 before id=2, deadlocks between these rows become impossible.

ORMs That Hide Transaction Boundaries

Django’s @transaction.atomic decorator wraps a view function in a transaction:

from django.db import transaction

@transaction.atomic
def transfer_money(from_id, to_id, amount):
    sender = Account.objects.select_for_update().get(id=from_id)
    receiver = Account.objects.select_for_update().get(id=to_id)

    sender.balance -= amount
    sender.save()

    receiver.balance += amount
    receiver.save()

What this actually does:

BEGIN;
SELECT * FROM accounts WHERE id = %(from_id)s FOR UPDATE;  -- row lock
SELECT * FROM accounts WHERE id = %(to_id)s FOR UPDATE;    -- row lock
UPDATE accounts SET balance = %(new_balance)s WHERE id = %(from_id)s;
UPDATE accounts SET balance = %(new_balance)s WHERE id = %(to_id)s;
COMMIT;

The select_for_update() generates SELECT ... FOR UPDATE, which acquires an exclusive row lock. Without it, a concurrent transaction could modify the balance between your SELECT and UPDATE, causing a lost update.

But notice: this code has the deadlock problem. If one request calls transfer_money(1, 2, 100) while another calls transfer_money(2, 1, 50), they lock rows in opposite order. The ORM doesn’t warn you. The decorator just wraps your code in BEGIN/COMMIT — it doesn’t reason about lock ordering, isolation level selection, or retry logic for serialization failures.

Without @transaction.atomic, Django runs in autocommit mode: each save() is its own transaction. If the first save() succeeds and the second fails (network error, constraint violation, server crash), the sender lost money but the receiver didn’t gain it. The abstraction — two Python method calls that look like ordinary object persistence — hides the fact that they’re two separate atomic operations with a failure window between them.

The PgBouncer Transaction Mode Trap

Connection pooling with PgBouncer introduces another failure mode. PgBouncer has three pooling modes:

  • Session mode: A client gets a dedicated server connection for its entire session. Safe, but limits concurrency to the number of PostgreSQL connections.
  • Transaction mode: A client gets a connection only for the duration of a transaction. Between transactions, the connection returns to the pool. This is what most deployments use because it dramatically increases the number of clients a single PostgreSQL server can handle.
  • Statement mode: A client gets a connection for each individual statement. Almost never used because it breaks multi-statement transactions entirely.

Transaction mode has a critical constraint: session-level state does not persist between transactions. This means:

-- These break in transaction mode:
SET search_path TO myschema;          -- gone after COMMIT
PREPARE my_query AS SELECT ...;       -- gone after COMMIT
LISTEN my_channel;                    -- gone after COMMIT
DECLARE my_cursor CURSOR FOR ...;     -- gone after COMMIT (if outside transaction)

-- And critically:
SET SESSION CHARACTERISTICS AS
  TRANSACTION ISOLATION LEVEL SERIALIZABLE;  -- gone after COMMIT

If your application sets the isolation level at the session level (which many ORMs do by default), the setting vanishes after each transaction. Your second transaction in the same Python database session runs at READ COMMITTED without warning, using a different PostgreSQL connection that knows nothing about your previous SET command.

The safe approach with PgBouncer transaction mode: set isolation level per-transaction, inside the BEGIN statement:

BEGIN ISOLATION LEVEL SERIALIZABLE;
-- your queries
COMMIT;

Or in Django:

from django.db import transaction

with transaction.atomic():
    # Django uses READ COMMITTED by default
    # To change: set DATABASES['default']['OPTIONS']['isolation_level']
    # in settings.py — but verify this works with your connection pooler
    pass

The abstraction stack here is three layers deep: your application code → ORM → connection pooler → database. Each layer has its own concept of a “connection” and a “transaction.” When those concepts don’t align — when the ORM thinks it has a persistent session but PgBouncer is recycling connections underneath — you get bugs that only appear under load, only at certain concurrency levels, and only when the pooler decides to reassign your connection to someone else between your transactions.

These aren’t theoretical failure modes. They’re production incidents that happen when teams deploy PgBouncer in transaction mode without understanding what session-level state their ORM relies on. The solution isn’t to avoid connection pooling — it’s to understand what each layer in your stack actually guarantees, instead of assuming the abstractions compose cleanly.

They don’t. They never have.