Skip to main content
data systems mechanics invariants in distributed architectures

Consensus and Fault Tolerance

5 min read Chapter 17 of 28
Summary

This section establishes consensus as the fundamental agreement...

This section establishes consensus as the fundamental agreement mechanism for distributed systems under faults. The FLP impossibility theorem proves a deterministic protocol cannot guarantee both safety (agreement, validity) and liveness (termination) in an asynchronous system with even one crash fault, a trade-off enforced by the valency argument. Byzantine faults, defined as arbitrary malicious behavior, demand stricter resilience: systems require 3f+1 nodes to tolerate f Byzantine faults, compared to 2f+1 for crash faults. Practical algorithms circumvent FLP by introducing timing assumptions (Paxos, Raft) or cryptographic coordination (PBFT). A Python code example illustrates the two-phase structure of a leader-based protocol and its vulnerability to asynchrony. The section frames these requirements as immutable trade-offs: safety versus liveness (FLP), and node count versus fault severity (Byzantine vs. crash).

CH5: Consensus and Fault Tolerance

A consensus protocol must provide three guarantees: Agreement (all non-faulty nodes decide the same value), Validity (the decided value must be one proposed by a node), and Termination (all non-faulty nodes eventually decide). The FLP impossibility result (Fischer, Lynch, and Paterson) proves that in a purely asynchronous distributed system, no deterministic protocol can satisfy all three guarantees in the presence of even a single crash fault. This establishes the core conflict: the FLP theorem enforces the immutable trade-off—a deterministic protocol in an asynchronous model must sacrifice either safety (Agreement and Validity) or liveness (Termination).

FLP Impossibility Result

The FLP theorem is proven via the valency argument, which analyzes the state space of a distributed protocol. A configuration is 0-valent if the system is guaranteed to decide 0, 1-valent if it will decide 1, and bivalent if either decision remains possible. The proof demonstrates that every protocol has at least one bivalent initial configuration. From this state, an adversarial scheduler can delay a critical message to a single process, creating two indistinguishable successor configurations. Because the protocol is deterministic, it must eventually decide, yet the scheduler can steer one path toward 0 and the other toward 1, violating Agreement. Thus, the protocol cannot guarantee termination without risking inconsistency, proving that safety and liveness cannot be simultaneously ensured under total asynchrony.

Byzantine Faults

Byzantine faults represent the most severe fault model, where faulty nodes may exhibit arbitrary, malicious behavior—including sending conflicting messages to different nodes. In contrast, crash faults only permit a node to stop sending messages. To tolerate f Byzantine faults, a system requires at least 3f + 1 nodes. This bound arises because a Byzantine node can simulate a split-brain scenario: for example, in the classic three generals problem, a traitor can send “Attack” to one general and “Retreat” to another, preventing agreement unless a third loyal general breaks the tie. In contrast, crash faults only require 2f + 1 nodes, as surviving nodes can communicate directly to reach a majority.

Practical Consensus Algorithms

Practical consensus algorithms circumvent the FLP impossibility by weakening its assumptions. Specifically, they introduce timing assumptions—such as timeouts or failure detectors—that transform the asynchronous model into a partially synchronous one. Under partial synchrony, liveness can be guaranteed after an unknown but finite time, allowing both safety and eventual termination. Paxos achieves this via a leader-based protocol that sequences proposals across phases of prepare/promise and accept/commit. Raft simplifies this mechanism with a strong leader and replicated log, ensuring that a majority quorum commits entries in order. These protocols do not defeat FLP; they operate outside its scope by assuming that message delays are bounded eventually.

Code Example: Consensus Protocol

The following Python code illustrates a simplified leader-based consensus protocol inspired by Paxos:

import asyncio
from dataclasses import dataclass
from typing import Optional

@dataclass
class Proposal:
    value: int
    proposer_id: int
    ballot: int  # Round number

class ConsensusNode:
    def __init__(self, node_id: int, total_nodes: int):
        self.node_id = node_id
        self.total_nodes = total_nodes
        self.current_ballot = 0
        self.accepted_value: Optional[Proposal] = None
        self.decided = False
        self.decision: Optional[int] = None

    async def propose(self, value: int, quorum_size: int) -> bool:
        # Phase 1: Prepare/Promise phase
        self.current_ballot += 1
        ballot = self.current_ballot
        proposal = Proposal(value=value, proposer_id=self.node_id, ballot=ballot)

        # Simulate sending prepare to a quorum
        promises = await self._send_prepare(proposal, quorum_size)

        if len(promises) < quorum_size:
            return False  # Failed to get a quorum of promises

        # Determine latest accepted value from promises
        highest_ballot = ballot
        latest_value = value
        for p in promises:
            if p.accepted_proposal and p.accepted_proposal.ballot > highest_ballot:
                highest_ballot = p.accepted_proposal.ballot
                latest_value = p.accepted_proposal.value

        # Phase 2: Accept phase
        accepts = await self._send_accept(Proposal(latest_value, self.node_id, ballot), quorum_size)

        if len(accepts) >= quorum_size:
            # Decision reached
            await self._send_decide(latest_value)
            self.decision = latest_value
            self.decided = True
            return True
        return False

    async def _send_prepare(self, proposal: Proposal, quorum: int):
        await asyncio.sleep(0.01)
        return [type('PromiseResponse', (), {'accepted_proposal': self.accepted_value})()] * quorum

    async def _send_accept(self, proposal: Proposal, quorum: int):
        await asyncio.sleep(0.01)
        return ["Accepted"] * quorum

    async def _send_decide(self, value: int):
        pass

This implementation would block indefinitely under pure asynchrony if a single node delays its response, illustrating the FLP problem. Real-world variants mitigate this by introducing timeouts and leader election.

Data Tables: Fault Models and Consensus Properties

The following table compares fault models and their consensus requirements:

Fault ModelDescriptionMinimum Nodes for Tolerance fExample Algorithms
Crash Fault (Non-Byzantine)Node stops functioning (fail-stop)n > 2fPaxos, Raft, Zab
Byzantine FaultNode behaves arbitrarily (malicious)n > 3fPBFT, Tendermint, HotStuff
Crash-RecoveryNode crashes but may restart with stable storagen > 2fPaxos, Raft (with log)

The Consensus Trade-off Hierarchy

Consensus design is governed by a layered hierarchy of invariants and trade-offs. At the foundational level, the FLP theorem dictates that in asynchronous systems, safety and liveness are mutually exclusive for deterministic protocols. This forces a choice: either guarantee agreement at the cost of potential deadlock, or ensure progress with risk of inconsistency. The next layer is the fault model. Byzantine faults impose stricter node requirements (3f+1) than crash faults (2f+1), reflecting the higher cost of verifying truth in the presence of deception. Practical systems resolve these tensions by relaxing environmental assumptions—introducing partial synchrony via timeouts—thereby enabling both safety and eventual liveness. Thus, every consensus protocol embodies a precise configuration of assumptions: about timing, fault behavior, and network reliability. The designer’s task is not to eliminate trade-offs, but to make them explicit and operationally sound.