Skip to main content
adaptive distributed systems intent-based dynamic consistency in java 21

State and Freshness Metadata

4 min read Chapter 17 of 25
Summary

The PostgreSQL materialization layer ensures data consistency in...

The PostgreSQL materialization layer ensures data consistency in distributed systems using state materialization, freshness metadata, and Last-Write-Wins semantics.

State and Freshness Metadata

Introduction

In distributed systems, maintaining data consistency across different nodes or replicas is crucial for ensuring the accuracy and reliability of the data. One approach to achieving this is through the use of a materialization layer, which computes and stores a ‘view’ of an entity’s current state based on an ordered stream of intents or events. In this section, we will delve into the design of the materialization layer in PostgreSQL, focusing on the role of state and freshness metadata in assisting consistency decisions.

State Materialization

State materialization is the process of computing and storing a ‘view’ of an entity’s current state based on an ordered stream of intents or events. This process is critical in distributed systems, where data is spread across multiple nodes or replicas. By materializing the state of an entity, we can ensure that all nodes or replicas reflect the same final value after all updates have been processed, achieving state convergence.

Freshness Metadata

Freshness metadata, on the other hand, refers to supplemental data points (timestamps, offsets, versions) that describe how current a materialized record is compared to the source of truth. This metadata is essential in determining the freshness of the data and making consistency decisions. For instance, freshness metadata can be used to detect replication lag, which is the delta between a primary update and its availability on a replica.

Last-Write-Wins (LWW) Conflict Resolution

In distributed systems, conflicts can arise when multiple updates are made to the same entity simultaneously. One approach to resolving these conflicts is through the use of Last-Write-Wins (LWW) semantics, where the most recent update prevails. This can be achieved by using a unique, monotonically increasing token (often a timestamp or sequence) where the highest token value prevails.

Vector Clocks / Version Tokens

Vector clocks or version tokens are values used to determine causality and ordering of events in a distributed system. These tokens can be used to prevent stale updates from overwriting newer data, ensuring that the system achieves state convergence.

PostgreSQL Materialization Layer

The PostgreSQL materialization layer is designed to store the materialized state of an entity, along with freshness metadata. The schema for this layer includes fields such as version, lww_token, last_kafka_offset, and updated_at, which are used to manage concurrent out-of-order updates and ensure freshness of the data.

Atomic LWW Update Pattern

The atomic LWW update pattern is used to update the materialized state of an entity in a way that ensures Last-Write-Wins semantics. This pattern involves using a WHERE clause to verify the expected version before an update, and updating the version, lww_token, and last_kafka_offset fields accordingly.

Code Example

CREATE TABLE entity_materialized_view (
    entity_id UUID PRIMARY KEY,
    state_data JSONB NOT NULL,
    version BIGINT NOT NULL,
    lww_token TIMESTAMP WITH TIME ZONE NOT NULL,
    last_kafka_offset BIGINT,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Atomic LWW Update Pattern
UPDATE entity_materialized_view
SET state_data = :newData,
    version = version + 1,
    lww_token = :newTokens,
    last_kafka_offset = :newOffset,
    updated_at = CURRENT_TIMESTAMP
WHERE entity_id = :id
  AND (lww_token < :newTokens OR (lww_token = :newTokens AND last_kafka_offset < :newOffset));

Conclusion

In conclusion, the design of the materialization layer in PostgreSQL is critical for achieving data consistency in distributed systems. By using state and freshness metadata, along with conflict resolution strategies such as Last-Write-Wins, we can ensure that the system achieves state convergence and reflects the most recent updates. The atomic LWW update pattern and the use of vector clocks or version tokens are essential components of this design, and can be implemented using the provided code example.

Sources

[1] Top Eventual Consistency Patterns You Must Know. (n.d.). ByteByteGo. https://bytebytego.com/guides/top-eventual-consistency-patterns-you-must-know/ [2] Eventual Schema Consistency. (n.d.). Software Patterns Lexicon. https://softwarepatternslexicon.com/data-modeling/nosql-data-modeling-patterns/eventual-schema-consistency/