Read Preference Selection for the Telemetry Platform
Read Preference Selection for the Telemetry Platform
The Symptom
The telemetry platform runs on a 3-member replica set (1 primary, 2 secondaries). The primary’s CPU is at 85% during peak hours. The secondaries run at 15% CPU (replication only). Adding read capacity requires either scaling vertically (larger primary) or routing reads to secondaries.
The Cause
Every read goes to the primary. The platform has four query types:
- Dashboard latest reading: Displays the most recent reading per sensor. Runs 500 times/second.
- Historical chart: Fetches the last 24 hours of readings for a sensor. Runs 50 times/second.
- Anomaly detection: Scans for readings above threshold. Runs 5 times/second.
- Monthly report: Aggregates readings per sensor per day. Runs 2 times/minute.
Only query type 1 requires the absolute latest data (the dashboard user expects to see the reading they just submitted). Types 2-4 tolerate seconds or minutes of staleness.
The Benchmark
| Query type | Frequency | Staleness tolerance | Current target | Optimal target |
|---|---|---|---|---|
| Dashboard latest | 500/s | None (read-after-write) | Primary | Primary |
| Historical chart | 50/s | 30 seconds | Primary | Secondary (30s) |
| Anomaly detection | 5/s | 60 seconds | Primary | Secondary (60s) |
| Monthly report | 0.03/s | 5 minutes | Primary | Secondary (300s) |
Load distribution if reads are routed optimally:
- Primary: 500/s (dashboard) = 90% reduction in primary read load
- Secondaries: 55/s each (chart + anomaly + report split across 2 secondaries)
The Fix
Create separate collection references with different read preferences:
@Configuration
public class MongoCollectionConfig {
private final MongoDatabase database;
public MongoCollectionConfig(MongoClient client) {
this.database = client.getDatabase("telemetry");
}
// Dashboard queries: primary (read-after-write consistency)
@Bean("readingsLatest")
public MongoCollection<Document> readingsLatest() {
return database.getCollection("readings")
.withReadPreference(ReadPreference.primary());
}
// Historical chart: secondary with 30s staleness
@Bean("readingsHistorical")
public MongoCollection<Document> readingsHistorical() {
return database.getCollection("readings")
.withReadPreference(ReadPreference.secondaryPreferred(
30, TimeUnit.SECONDS));
}
// Anomaly detection: secondary with 60s staleness
@Bean("readingsAnomaly")
public MongoCollection<Document> readingsAnomaly() {
return database.getCollection("readings")
.withReadPreference(ReadPreference.secondary(
60, TimeUnit.SECONDS));
}
// Reports: secondary with 5-minute staleness
@Bean("readingsReport")
public MongoCollection<Document> readingsReport() {
return database.getCollection("readings")
.withReadPreference(ReadPreference.secondary(
300, TimeUnit.SECONDS));
}
}
Use the appropriate collection reference in each query:
@Service
public class TelemetryQueryService {
private final MongoCollection<Document> readingsLatest;
private final MongoCollection<Document> readingsHistorical;
// FAST: Dashboard uses primary for consistency
public Document getLatestReading(String sensorId) {
return readingsLatest.find(Filters.eq("sensorId", sensorId))
.sort(Sorts.descending("ts"))
.first();
}
// FAST: Historical chart uses secondary for read scaling
public List<Document> getHistoricalReadings(
String sensorId, Instant start, Instant end) {
return readingsHistorical.find(Filters.and(
Filters.eq("sensorId", sensorId),
Filters.gte("ts", Date.from(start)),
Filters.lt("ts", Date.from(end))
)).sort(Sorts.ascending("ts")).into(new ArrayList<>());
}
}
The Proof
After routing historical, anomaly, and report queries to secondaries:
| Metric | Before (all primary) | After (mixed) |
|---|---|---|
| Primary CPU | 85% | 32% |
| Secondary CPU | 15% | 42% |
| Dashboard p99 | 45ms | 18ms (less primary contention) |
| Historical chart p99 | 120ms | 85ms (secondary has less load) |
| Monthly report time | 340s | 280s |
The primary CPU drops from 85% to 32%. This headroom allows the platform to absorb 2.5x traffic growth before needing to scale.
The Trade-off
Secondary reads introduce the risk of reading stale data. For the telemetry platform, a 30-second delay on historical charts is invisible to users (they are viewing 24 hours of data). But if a sensor’s readings are critical for safety (temperature alarm), reading a 30-second-old value could miss an alarm condition.
The mitigation: safety-critical queries (alarm evaluation) always use readPreference: primary. Only display and reporting queries use secondary reads. This classification must be explicit and documented. A future developer adding a new query type must decide its staleness tolerance before choosing the collection reference.
On sharded clusters, read preference interacts with shard targeting. A query with readPreference: secondary is still targeted if it includes the shard key. The mongos routes the query to the correct shard’s secondaries. A scatter-gather query with secondary read preference hits secondaries on all shards, which is even slower than hitting primaries on all shards because secondaries may have less cache.