Skip to main content
data systems mechanics invariants in distributed architectures

MVCC: Multi-Version Concurrency Control

5 min read Chapter 15 of 28
Summary

This section introduces Multi-Version Concurrency Control (MVCC) as...

This section introduces Multi-Version Concurrency Control (MVCC) as the mechanism that eliminates read-write blocking by maintaining immutable versions of data. Readers operate on a consistent snapshot defined at transaction start, while writers create new versions, ensuring concurrent access without conflicts. The core mechanics are defined by visibility rules that evaluate version timestamps (xmin, xmax) against transaction snapshots. These rules are implemented through deterministic logic, exemplified by the provided `mvcc_visibility_check` function. The fundamental trade-off is storage amplification: multiple versions coexist, necessitating garbage collection to reclaim space. The `MVCCGarbageCollector` class illustrates how versions are safely deleted based on the oldest active transaction horizon. The section connects MVCC to Snapshot Isolation, highlighting its prevention of dirty and non-repeatable reads while noting its allowance of phantom reads and write skew. The narrative frames MVCC as transforming temporal conflicts into spatial management problems, with version chains and index indirection as key implementation patterns.

MVCC: Multi-Version Concurrency Control

MVCC enforces the fundamental storage-for-concurrency trade-off: multiple immutable versions enable non-blocking reads at the cost of storage amplification and mandatory garbage collection. This mechanism guarantees that readers never block writers—concurrent transactions proceed without read-write conflicts—by maintaining discrete snapshots of data state, each tied to a transaction’s logical point in time. The price of this guarantee is paid in storage: every update generates a new version, and obsolete versions persist until safely reclaimed.

Core Principles of MVCC

  1. Versioning: Each data item (e.g., a row) is stored as a sequence of immutable versions. Each version is annotated with transaction IDs: xmin (the creator) and xmax (the deleter, if any). A version represents a fixed state at a point in time.
  2. Transaction Visibility: Every transaction operates against a snapshot—a bounded range of transaction IDs—determining which versions are visible. This ensures isolation: no transaction observes partial or in-flight changes from others.
  3. Version Creation and Deletion: Writers generate new versions without modifying existing ones. Old versions remain until garbage collection removes them, a process that cannot proceed arbitrarily due to long-running or failed transactions.

MVCC’s guarantee—readers never block writers—is purchased with the currency of storage amplification: multiple versions of logical data must coexist. This is not an implementation detail but an invariant of the design.

MVCC Visibility Rules

A version is visible to a transaction if and only if:

  • The creating transaction (xmin) has committed and falls within the transaction’s snapshot range.
  • The version has not been deleted by a committed transaction (xmax is null or refers to a transaction outside the snapshot or not yet committed).

These rules enforce snapshot isolation: each transaction observes a temporally consistent database state, immune to concurrent modifications.

Example of MVCC in Action

Consider a database table with a row representing a user’s account balance. Initially, the balance is $100.

  1. Initial State: Version 1 - Balance: $100, xmin: Transaction 1 (committed), xmax: null
  2. Transaction 2 Starts: Acquires snapshot including Transaction 1; reads version 1 ($100).
  3. Transaction 3 Updates Balance: Creates version 2 - Balance: $120, xmin: Transaction 3. Version 1’s xmax is set to Transaction 3.
  4. Transaction 2 Commits: Still sees version 1 ($100)—version 2 is invisible because it was created after Transaction 2’s snapshot.
  5. Transaction 3 Commits: Version 2 ($120) becomes visible to all new transactions.

Code Example: MVCC Visibility Check

def mvcc_visibility_check(
    xmin: int,
    xmax: int | None,
    snapshot_xmin: int,
    snapshot_xmax: int,
    active_xids: set[int]
) -> bool:
    # Rule 1: Creating transaction must be committed and within snapshot range.
    if xmin in active_xids:
        # Creator was still in progress at snapshot time.
        return False
    if not (snapshot_xmin <= xmin < snapshot_xmax):
        # Creator not visible in snapshot.
        return False

    # Rule 2: Version must not be deleted by a committed transaction in snapshot.
    if xmax is not None:
        if xmax not in active_xids:
            # Deleter is committed.
            if snapshot_xmin <= xmax < snapshot_xmax:
                # Deleter is visible and committed → version is deleted.
                return False
        # If xmax is in active_xids, deletion is not yet visible.
    return True

Garbage Collection in MVCC

Garbage collection is the inevitable reclamation mechanism for the storage debt incurred by versioning. It removes versions that are no longer visible to any active or future transaction. The collection horizon is bounded by the oldest active transaction ID: versions deleted by transactions before this horizon can be safely purged.

Crucially, garbage collection must assume failure is the default state. Long-running transactions or system crashes can stall the horizon indefinitely. Therefore, collection is conservative: only versions whose xmax is both committed and strictly less than the oldest active transaction ID are eligible for deletion.

Example of Garbage Collection

class MVCCGarbageCollector:
    def __init__(self):
        self.oldest_active_xid = None
        self.committed_xids = set()

    def update_oldest_active(self, active_xids: set[int]) -> None:
        # Advance the deletion horizon to the oldest active transaction.
        if active_xids:
            self.oldest_active_xid = min(active_xids)
        else:
            # No active transactions: safe to delete up to max committed.
            self.oldest_active_xid = float('inf')

    def is_version_safe_to_delete(self, version_xmax: int | None) -> bool:
        # Only deleted versions (with xmax) can be collected.
        if version_xmax is None:
            return False
        # Deleter must be committed.
        if version_xmax not in self.committed_xids:
            return False
        # Deleter must precede the oldest active transaction.
        if self.oldest_active_xid is not None and version_xmax >= self.oldest_active_xid:
            return False
        return True

Conclusion

MVCC is not a performance optimization—it is a deterministic enforcement of concurrency through versioned immutability. It guarantees non-blocking reads by design, using snapshot isolation enforced via transaction-scoped visibility rules. This guarantee is inextricable from its cost: storage amplification from coexisting versions and the mandatory, failure-resilient garbage collection that reclaims obsolete data. These are not side effects but first-order invariants. The system does not merely support high concurrency; it purchases it, byte by byte, with storage, and redeems it only through disciplined, crash-safe reclamation.