Skip to main content
architecting resilient distributed systems high-scale engineering and failure mode mitigation

The Saga Pattern for Atomic Transactions

3 min read Chapter 12 of 13
Summary

The Saga pattern is a failure management approach...

The Saga pattern is a failure management approach that maintains data consistency across distributed services by breaking down long-running transactions into a sequence of local transactions with compensating actions.

The Saga Pattern for Atomic Transactions

Introduction to Saga

The Saga pattern is a failure management approach designed to maintain data consistency across distributed services by breaking down long-running transactions into a sequence of local transactions, each accompanied by a corresponding compensating action [1]. This pattern is crucial in distributed systems where ensuring atomicity and consistency across services is challenging due to the lack of centralized control and potential network failures.

Orchestration vs. Choreography in Saga Implementation

Saga implementation can be categorized into two main approaches: Orchestration-based and Choreography-based. The Orchestration-based approach involves a centralized Saga Execution Coordinator (SEC) or Process Manager that directs the participants on which local transactions to execute. In contrast, the Choreography-based approach is decentralized, relying on participants exchanging events without a central controller, triggering local transactions in other services.

Comparison of Orchestration and Choreography

PhaseOrchestrationChoreography
ControlCentralized (Orchestrator)Decentralized (Subscribers)
CouplingOrchestrator knows all servicesServices know events
ComplexityLow for many participantsHigh for many participants
Failure PointSingle point (Orchestrator)Distributed failure risk

Implementing Compensating Transactions

Compensating transactions are idempotent operations designed to undo the effects of previously successful local transactions when a subsequent step in a Saga fails. These transactions must be carefully implemented to ensure they can succeed even if the service they are undoing failed in a transient manner. For instance, in a trip-booking saga, if a hotel reservation fails after a flight is booked, the flight cancellation is the required compensating action.

State Machines in Saga Process Managers

State machines in saga process managers play a critical role in handling saga workflows. They must be deterministic to allow for replaying history during recovery and should handle both ‘Forward Recovery’ (retrying a step) and ‘Backward Recovery’ (compensating). An example of basic state machine logic for a Saga Process Manager handling order transitions is as follows:

func (sm *OrderStateMachine) Handle(event Event) {
  switch sm.State {
  case StateCreated:
    if event.Type == OrderValidated {
      sm.TransitionTo(StatePendingPayment)
      sm.DispatchCommand(ChargeCreditCard)
    }
  case StatePendingPayment:
    if event.Type == PaymentFailed {
      sm.TransitionTo(StateCompensating)
      sm.DispatchCommand(CancelOrderInInventory)
    } else if event.Type == PaymentSucceeded {
      sm.TransitionTo(StateCompleted)
    }
  }
}

Conclusion

The Saga pattern offers a robust approach to managing distributed transactions, ensuring data consistency and availability in the face of failures. By understanding the differences between Orchestration-based and Choreography-based Saga implementations and carefully designing compensating transactions and state machines, developers can build resilient distributed systems.

Sources

[1] Garcia-Molina, H., & Salem, K. (1987). Sagas. ACM SIGMOD Record, 16(3), 249-259. [2] https://www.theserverside.com/tutorial/How-the-saga-design-pattern-in-microservices-works [3] https://www.baeldung.com/cs/saga-pattern-microservices [4] https://learn.microsoft.com/en-us/azure/architecture/patterns/saga