Read Preference Selection for the Telemetry Platform

The Symptom

The telemetry platform runs on a 3-member replica set (1 primary, 2 secondaries). The primary’s CPU is at 85% during peak hours. The secondaries run at 15% CPU (replication only). Adding read capacity requires either scaling vertically (larger primary) or routing reads to secondaries.

The Cause

Every read goes to the primary. The platform has four query types:

Dashboard latest reading: Displays the most recent reading per sensor. Runs 500 times/second.
Historical chart: Fetches the last 24 hours of readings for a sensor. Runs 50 times/second.
Anomaly detection: Scans for readings above threshold. Runs 5 times/second.
Monthly report: Aggregates readings per sensor per day. Runs 2 times/minute.

Only query type 1 requires the absolute latest data (the dashboard user expects to see the reading they just submitted). Types 2-4 tolerate seconds or minutes of staleness.

The Benchmark

Query type	Frequency	Staleness tolerance	Current target	Optimal target
Dashboard latest	500/s	None (read-after-write)	Primary	Primary
Historical chart	50/s	30 seconds	Primary	Secondary (30s)
Anomaly detection	5/s	60 seconds	Primary	Secondary (60s)
Monthly report	0.03/s	5 minutes	Primary	Secondary (300s)

Load distribution if reads are routed optimally:

Primary: 500/s (dashboard) = 90% reduction in primary read load
Secondaries: 55/s each (chart + anomaly + report split across 2 secondaries)

The Fix

Create separate collection references with different read preferences:

@Configuration
public class MongoCollectionConfig {

    private final MongoDatabase database;

    public MongoCollectionConfig(MongoClient client) {
        this.database = client.getDatabase("telemetry");
    }

    // Dashboard queries: primary (read-after-write consistency)
    @Bean("readingsLatest")
    public MongoCollection<Document> readingsLatest() {
        return database.getCollection("readings")
            .withReadPreference(ReadPreference.primary());
    }

    // Historical chart: secondary with 30s staleness
    @Bean("readingsHistorical")
    public MongoCollection<Document> readingsHistorical() {
        return database.getCollection("readings")
            .withReadPreference(ReadPreference.secondaryPreferred(
                30, TimeUnit.SECONDS));
    }

    // Anomaly detection: secondary with 60s staleness
    @Bean("readingsAnomaly")
    public MongoCollection<Document> readingsAnomaly() {
        return database.getCollection("readings")
            .withReadPreference(ReadPreference.secondary(
                60, TimeUnit.SECONDS));
    }

    // Reports: secondary with 5-minute staleness
    @Bean("readingsReport")
    public MongoCollection<Document> readingsReport() {
        return database.getCollection("readings")
            .withReadPreference(ReadPreference.secondary(
                300, TimeUnit.SECONDS));
    }
}

Use the appropriate collection reference in each query:

@Service
public class TelemetryQueryService {

    private final MongoCollection<Document> readingsLatest;
    private final MongoCollection<Document> readingsHistorical;

    // FAST: Dashboard uses primary for consistency
    public Document getLatestReading(String sensorId) {
        return readingsLatest.find(Filters.eq("sensorId", sensorId))
            .sort(Sorts.descending("ts"))
            .first();
    }

    // FAST: Historical chart uses secondary for read scaling
    public List<Document> getHistoricalReadings(
            String sensorId, Instant start, Instant end) {
        return readingsHistorical.find(Filters.and(
            Filters.eq("sensorId", sensorId),
            Filters.gte("ts", Date.from(start)),
            Filters.lt("ts", Date.from(end))
        )).sort(Sorts.ascending("ts")).into(new ArrayList<>());
    }
}

The Proof

After routing historical, anomaly, and report queries to secondaries:

Metric	Before (all primary)	After (mixed)
Primary CPU	85%	32%
Secondary CPU	15%	42%
Dashboard p99	45ms	18ms (less primary contention)
Historical chart p99	120ms	85ms (secondary has less load)
Monthly report time	340s	280s

The primary CPU drops from 85% to 32%. This headroom allows the platform to absorb 2.5x traffic growth before needing to scale.

The Trade-off

Secondary reads introduce the risk of reading stale data. For the telemetry platform, a 30-second delay on historical charts is invisible to users (they are viewing 24 hours of data). But if a sensor’s readings are critical for safety (temperature alarm), reading a 30-second-old value could miss an alarm condition.

The mitigation: safety-critical queries (alarm evaluation) always use readPreference: primary. Only display and reporting queries use secondary reads. This classification must be explicit and documented. A future developer adding a new query type must decide its staleness tolerance before choosing the collection reference.

On sharded clusters, read preference interacts with shard targeting. A query with readPreference: secondary is still targeted if it includes the shard key. The mongos routes the query to the correct shard’s secondaries. A scatter-gather query with secondary read preference hits secondaries on all shards, which is even slower than hitting primaries on all shards because secondaries may have less cache.