Data Replication and Consistency Across Regions
Data Replication and Consistency Across Regions
The Symptom
The EU-West region is live. DNS geo-routing sends European riders to eu-west. Latency dropped from 520ms to 85ms. The product team is celebrating. Then a rider in Berlin updates their payment method. They open the app in London two hours later. The old payment method is showing. They update it again. Fly to New York. Old payment method.
The support team files three tickets in one week: “User profile changes not persisting.” The changes are persisting. They are persisting in the region where the user made them. The other regions have not received the update yet because the replication subscription was paused by a schema mismatch nobody noticed for 9 hours.
Multi-region is live. Multi-region consistency is not.
The Cause
PostgreSQL logical replication is asynchronous. Changes on the publisher (US-East) are streamed to subscribers (EU-West) with a delay that ranges from sub-second under normal conditions to minutes or hours when things go wrong. “Things going wrong” includes: schema changes applied to the publisher but not the subscriber, network interruptions, subscriber falling behind on a large batch insert, or the subscriber’s max_worker_processes being too low.
The ride-hailing platform has four categories of data, each with different consistency requirements:
Data Type Scope Consistency Need Replication Strategy
Trip data Regional Strong in-region No replication (stays local)
Driver location Regional Eventual, < 1s No replication (regional Redis)
User profiles Global Eventual, < 5s PG logical replication
Fare config Global Eventual, < 60s PG logical replication
Surge zones Global Eventual, < 60s PG logical replication
Payment methods Global Eventual, < 5s PG logical replication
Trip data is regional. A trip from Berlin Mitte to Kreuzberg exists only in EU-West. A trip from Manhattan to JFK exists only in US-East. Replicating trip data globally wastes bandwidth and creates unnecessary conflict potential.
Driver location is regional and ephemeral. A driver in Berlin is irrelevant to a rider in New York. Each region’s Redis holds only its drivers.
User profiles, fare configuration, and surge zone definitions are global. A rider who creates an account in Berlin and travels to New York needs their profile in both regions.
The Baseline
Initial replication setup: the publication exists, the subscription exists, but monitoring does not.
-- BOTTLENECK: Replication with no monitoring
-- On US-East (publisher)
CREATE PUBLICATION ride_platform_pub
FOR TABLE user_profiles, fare_config, surge_zones,
payment_methods
WITH (publish = 'insert, update, delete');
-- On EU-West (subscriber)
CREATE SUBSCRIPTION ride_platform_sub
CONNECTION 'host=pg-us-east.internal port=5432
dbname=rides user=replicator'
PUBLICATION ride_platform_pub;
-- No lag monitoring
-- No alerting
-- No conflict resolution strategy
-- Schema changes break the subscription silently
When the subscription breaks, the subscriber stops receiving updates. No error in the application logs. No alert. The data drifts until a user reports it.
The Fix
Replication Lag Monitoring
-- SCALED: Monitor replication lag on the publisher
-- This query runs on US-East and reports lag per subscriber
SELECT
client_addr,
application_name,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Prometheus Exporter for Replication Lag
# SCALED: postgres_exporter query for replication lag
pg_replication:
query: |
SELECT
application_name,
EXTRACT(EPOCH FROM replay_lag) AS replay_lag_seconds,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication
metrics:
- application_name:
usage: "LABEL"
- replay_lag_seconds:
usage: "GAUGE"
description: "Replication lag in seconds"
- lag_bytes:
usage: "GAUGE"
description: "Replication lag in bytes"
Alert at 5 Seconds
# SCALED: Prometheus alert for replication lag
groups:
- name: replication
rules:
- alert: ReplicationLagHigh
expr: >
pg_replication_replay_lag_seconds{
application_name="ride_platform_sub"
} > 5
for: 1m
labels:
severity: warning
annotations:
summary: >
Replication lag to
{{ $labels.application_name }}
is {{ $value }}s
runbook: >
Check subscriber status with
SELECT * FROM pg_stat_subscription;
- alert: ReplicationLagCritical
expr: >
pg_replication_replay_lag_seconds{
application_name="ride_platform_sub"
} > 30
for: 2m
labels:
severity: critical
annotations:
summary: >
Replication lag to
{{ $labels.application_name }}
exceeds 30s. Possible subscription failure.
Data Partitioning: Regional vs Global
// SCALED: Repository layer that routes writes correctly
@Repository
public class TripRepository {
private final R2dbcEntityTemplate localDb;
private final RegionRouter regionRouter;
// Trips write to the LOCAL region's database only.
// A trip in Berlin writes to EU-West PG.
// A trip in NYC writes to US-East PG.
// No cross-region replication for trip data.
public Mono<Trip> save(Trip trip) {
trip.setRegion(regionRouter.getCurrentRegion());
return localDb.insert(trip);
}
// Trip queries are region-scoped.
// A rider in Berlin sees only their EU trips.
// If they need US trip history, a cross-region
// API call fetches it on demand (rare).
public Flux<Trip> findByRiderId(String riderId) {
return localDb.select(Trip.class)
.matching(query(where("riderId").is(riderId)
.and("region").is(
regionRouter.getCurrentRegion())))
.all();
}
}
// SCALED: Global data writes to US-East primary,
// replicates to EU-West via PG logical replication
@Repository
public class UserProfileRepository {
private final R2dbcEntityTemplate primaryDb;
private final R2dbcEntityTemplate localDb;
private final RegionRouter regionRouter;
private final KafkaTemplate<String, ProfileUpdateEvent>
kafkaTemplate;
public Mono<UserProfile> save(UserProfile profile) {
if (regionRouter.isPrimary()) {
// US-East: write directly, replication
// handles EU-West
return primaryDb.update(profile);
} else {
// EU-West: write to local DB
// (which accepts writes for profiles)
// AND publish event for cache invalidation
return localDb.update(profile)
.doOnSuccess(p -> kafkaTemplate.send(
"profile-updates",
p.getUserId(),
new ProfileUpdateEvent(
p.getUserId(),
regionRouter.getCurrentRegion(),
Instant.now())));
}
}
}
Conflict Resolution: Last-Write-Wins
-- SCALED: Conflict resolution for payment methods
-- PG logical replication does not handle conflicts
-- automatically. The application must.
-- Add a last_modified column for conflict resolution
ALTER TABLE payment_methods
ADD COLUMN last_modified TIMESTAMPTZ
DEFAULT now();
-- Application-level conflict resolution:
-- When a profile update arrives via replication AND
-- a local update exists, the later timestamp wins.
// SCALED: Last-write-wins conflict resolution
@KafkaListener(
topics = "profile-updates",
groupId = "${app.region}-profile-sync")
public class ProfileConflictResolver {
private final UserProfileRepository profileRepo;
private final ReactiveRedisTemplate<String, String> redis;
@KafkaHandler
public void onProfileUpdate(ProfileUpdateEvent event) {
// Check if local version is newer
profileRepo.findById(event.getUserId())
.flatMap(local -> {
if (local.getLastModified()
.isAfter(event.getTimestamp())) {
// Local is newer, ignore remote update
return Mono.empty();
}
// Remote is newer, invalidate cache
// Next read fetches from replicated PG
return redis.delete(
"profile:" + event.getUserId());
})
.subscribe();
}
}
Redis: Regional Instances, Never Replicated
// SCALED: Driver location is purely regional
@Service
public class DriverLocationService {
private final ReactiveRedisTemplate<String, String>
regionalRedis;
// Driver locations are stored in the regional Redis
// only. A driver in Berlin exists in EU-West Redis.
// US-East Redis does not know about them.
public Mono<Boolean> updateLocation(
String driverId, double lat, double lng) {
return regionalRedis.opsForGeo()
.add("drivers:active",
new Point(lng, lat), driverId)
.map(added -> added > 0);
}
// Nearby driver search is always regional.
// "Find drivers near me" only finds drivers
// in the same region.
public Flux<String> findNearby(
double lat, double lng, double radiusKm) {
return regionalRedis.opsForGeo()
.radius("drivers:active",
new Circle(new Point(lng, lat),
new Distance(radiusKm,
Metrics.KILOMETERS)))
.map(result -> result.getContent().getName());
}
}
CockroachDB and Spanner offer globally consistent reads and writes. They pay for this with higher write latency (consensus across regions) and significantly higher cost. For the ride-hailing platform, eventual consistency with explicit per-data-type SLOs is the right tradeoff. A rider seeing a 3-second-old profile is acceptable. A rider seeing a 3-second-old driver location is also acceptable (the driver moved 30 meters). A rider not seeing their trip at all because it exists in another region is not acceptable, which is why trip data is regional and queries are region-scoped.
The Proof
Replication lag monitoring after deploying the full setup:
Steady State (72-hour observation):
Replication lag (avg): 0.3 seconds
Replication lag (p99): 1.8 seconds
Replication lag (max): 4.2 seconds
Subscription interruptions: 0
Alert fires: 0
Profile update propagation test:
Update in US-East, read in EU-West:
p50: 0.8 seconds
p99: 2.1 seconds
Update in EU-West, cache invalidation:
p50: 1.2 seconds (Kafka cross-region)
p99: 3.4 seconds
Consistency SLO compliance:
Driver location (< 1s): 99.97% met
User profiles (< 5s): 99.94% met
Fare config (< 60s): 100% met
Surge zones (< 60s): 100% met
Every data type meets its consistency SLO. The monitoring catches drift before users do. The conflict resolution handles the rare case of simultaneous profile updates across regions. The separation of regional data (trips, driver location) from global data (profiles, config) keeps replication bandwidth manageable and conflict potential low.