Maintaining Consistency in Denormalized Schemas
Maintaining Consistency in Denormalized Schemas
The Symptom
The sensor metadata for sensor-00042 is updated: the sensor moved from Building HQ, Floor 3 to Building B2, Floor 7. The update to the sensors collection takes 2ms. But 200 million reading documents still embed the old metadata. The dashboard shows sensor-00042 in Building HQ, Floor 3 because the readings collection is not yet updated.
The Cause
Denormalization pushes the consistency burden from read time to write time. When the source of truth changes, all copies must be updated. Without a propagation mechanism, copies become stale.
The Benchmark
Measure the cost of different consistency propagation strategies:
@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class ConsistencyPropagationBenchmark {
private MongoCollection<Document> readings;
private int documentsToUpdate;
@Param({"10000", "100000", "1000000"})
private int matchingDocuments;
@Setup
public void setup() {
MongoClient client = MongoClients.create("mongodb://localhost:27017");
readings = client.getDatabase("telemetry").getCollection("readings");
documentsToUpdate = matchingDocuments;
}
@Benchmark
public long singleUpdateMany() {
UpdateResult result = readings.updateMany(
Filters.eq("sensorId", "sensor-00042"),
Updates.combine(
Updates.set("sensor.building", "B2"),
Updates.set("sensor.floor", 7)
)
);
return result.getModifiedCount();
}
@Benchmark
public long batchedBulkWrite() {
long modified = 0;
int batchSize = 10000;
while (true) {
List<Document> batch = readings.find(
Filters.and(
Filters.eq("sensorId", "sensor-00042"),
Filters.ne("sensor.building", "B2")
)
).limit(batchSize)
.projection(Projections.include("_id"))
.into(new ArrayList<>());
if (batch.isEmpty()) break;
List<WriteModel<Document>> writes = batch.stream()
.map(doc -> new UpdateOneModel<Document>(
Filters.eq("_id", doc.get("_id")),
Updates.combine(
Updates.set("sensor.building", "B2"),
Updates.set("sensor.floor", 7),
Updates.set("sensor.metaVersion", 2)
)
))
.toList();
BulkWriteResult result = readings.bulkWrite(writes,
new BulkWriteOptions().ordered(false));
modified += result.getModifiedCount();
}
return modified;
}
}
Results:
Benchmark (matchingDocuments) Mode Cnt Score Error Units
ConsistencyPropagationBenchmark.singleUpdateMany 10000 ss 1 0.800 s/op
ConsistencyPropagationBenchmark.singleUpdateMany 100000 ss 1 8.200 s/op
ConsistencyPropagationBenchmark.singleUpdateMany 1000000 ss 1 82.000 s/op
ConsistencyPropagationBenchmark.batchedBulkWrite 10000 ss 1 1.200 s/op
ConsistencyPropagationBenchmark.batchedBulkWrite 100000 ss 1 10.500 s/op
ConsistencyPropagationBenchmark.batchedBulkWrite 1000000 ss 1 95.000 s/op
updateMany is faster than batched bulk writes because it avoids the per-document _id lookup overhead. But updateMany holds a write lock for the entire operation. For 1 million documents, that is 82 seconds of blocked writes to that collection.
The Fix
Three strategies, chosen based on staleness tolerance.
Strategy 1: Version-based lazy update (minutes of staleness tolerated).
Add a version number to the embedded metadata. When reading, check the version against the source of truth. If stale, update on read:
// FAST: Lazy consistency with version check
public Document getReadingWithCurrentMetadata(String readingId) {
Document reading = readings.find(Filters.eq("_id", readingId)).first();
String sensorId = reading.getString("sensorId");
// Check version against cached current version
int embeddedVersion = reading.getEmbedded(
List.of("sensor", "metaVersion"), Integer.class, 0);
int currentVersion = sensorMetadataVersionCache.get(sensorId);
if (embeddedVersion < currentVersion) {
Document currentMeta = sensorMetadataCache.get(sensorId);
readings.updateOne(
Filters.eq("_id", readingId),
Updates.set("sensor", currentMeta)
);
reading.put("sensor", currentMeta);
}
return reading;
}
This spreads the update cost across read operations. Hot documents (frequently read) are updated quickly. Cold documents (rarely read) stay stale but do not consume update resources.
Strategy 2: Background bulk propagation (hours of staleness tolerated).
Run a background job that propagates metadata changes in batches:
// FAST: Background propagation with rate limiting
public void propagateMetadataChange(String sensorId, Document newMetadata) {
int batchSize = 5000;
long sleepBetweenBatches = 100; // ms, to avoid saturating I/O
while (true) {
UpdateResult result = readings.updateMany(
Filters.and(
Filters.eq("sensorId", sensorId),
Filters.lt("sensor.metaVersion", newMetadata.getInteger("metaVersion"))
),
Updates.set("sensor", newMetadata),
new UpdateOptions().hint(new Document("sensorId", 1).append("ts", -1))
);
if (result.getModifiedCount() == 0) break;
// Rate limit to avoid overwhelming the server
try { Thread.sleep(sleepBetweenBatches); } catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
Strategy 3: Targeted denormalization (zero staleness required).
Only denormalize fields that never change. Sensor type (“temperature”) does not change. Building and floor might. Denormalize the immutable fields, use $lookup for the mutable ones:
// FAST: Partial denormalization of immutable fields
Document doc = new Document()
.append("sensorId", reading.getSensorId())
.append("ts", Date.from(reading.getTimestamp()))
.append("temperature", reading.getTemperature())
.append("sensorType", sensorMeta.getString("type")); // Immutable: safe to embed
// building and floor: NOT embedded, fetched via $lookup when needed
The Proof
| Strategy | Staleness | Write overhead | Read overhead | Complexity |
|---|---|---|---|---|
| No denormalization ($lookup) | Zero | None | 18.7x slower (CH9-S1) | Low |
| Full denormalization + updateMany | 82s during propagation | High during updates | None | Medium |
| Version-based lazy update | Seconds (on next read) | Amortized across reads | Version check per read | Medium |
| Background propagation | Minutes to hours | Spread over time | None | Medium |
| Partial denormalization | Zero for immutable fields | None for immutable fields | $lookup for mutable fields | Low |
The Trade-off
Sensor metadata changes are rare: a sensor might be relocated once per year. For rare changes, full denormalization with background propagation is the pragmatic choice. The 82-second propagation window is acceptable when it happens once a year.
For data that changes frequently (user profile names embedded in activity feeds, product prices embedded in order history), partial denormalization is safer. Embed the fields that never change (user ID, product SKU) and reference the fields that do (display name, current price).
The decision framework from the chapter introduction applies here. If reads outnumber writes by 100:1 or more, and the denormalized data changes less than once per day, full denormalization with background propagation is almost always the right choice for MongoDB.