Shard Key Selection for the Bucket Pattern
Shard Key Selection for the Bucket Pattern
The Symptom
The telemetry platform shards the 5-minute bucket collection with {sensorId: "hashed"}. Write distribution is even, but the upsert operation from CH7 is failing intermittently:
Command failed with error 11000: 'E11000 duplicate key error collection:
telemetry.buckets index: _id_ dup key: {...}'
Some upserts create duplicate bucket documents instead of appending to existing ones.
The Cause
The hashed shard key {sensorId: "hashed"} does not include bucketStart. The upsert filter is {sensorId: X, bucketStart: Y}. For the upsert to work correctly on a sharded collection, the shard key must be a prefix of the upsert filter, or the query must target a single shard.
With hashed sensorId, the query targets the correct shard (the hash of sensorId determines the shard). But the upsert’s uniqueness is enforced by the _id index, which is local to each shard. If a routing error or a stale config causes the upsert to reach the wrong shard, a new document is created instead of updating the existing one.
The deeper problem: the shard key for a bucketed collection must include both the entity identifier and the bucket boundary to ensure that upsert operations are routed correctly and that bucket documents for the same entity are co-located.
The Benchmark
Compare shard key options for the bucket collection:
| Shard key | Upsert errors/hour | Write distribution | Query efficiency (single sensor, 24h) |
|---|---|---|---|
{sensorId: "hashed"} | 15-30 | Even | Good (single shard) |
{sensorId: 1, bucketStart: 1} | 0 | Uneven (CH16-S1 problem) | Good (single shard, contiguous) |
{sensorId: "hashed", bucketStart: 1} | Not supported | N/A | N/A |
MongoDB does not support compound shard keys where one field is hashed and another is ranged.
The Fix
Use a compound range shard key {sensorId: 1, bucketStart: 1} with pre-splitting to prevent hot shards:
// Pre-split chunks for even distribution
sh.shardCollection("telemetry.buckets_5min", { sensorId: 1, bucketStart: 1 })
// Pre-split based on sensorId ranges
var sensorRanges = [
"sensor-00000", "sensor-02500", "sensor-05000", "sensor-07500"
];
for (var i = 0; i < sensorRanges.length; i++) {
sh.splitAt("telemetry.buckets_5min", {
sensorId: sensorRanges[i],
bucketStart: MinKey
});
}
Pre-splitting creates chunk boundaries before data is inserted. The balancer distributes the pre-split chunks across shards, ensuring that each shard receives an equal share of sensorId ranges from the start.
For the uneven frequency problem (some sensors produce more data), add periodic rebalancing monitoring:
// FAST: Monitor shard distribution and alert on imbalance
public void checkShardBalance() {
Document shardDist = database.runCommand(
new Document("dataSize", "telemetry.buckets_5min")
.append("keyPattern", new Document("sensorId", 1).append("bucketStart", 1))
);
// Per-shard document counts
Document collStats = database.runCommand(new Document("collStats", "buckets_5min"));
Document shards = collStats.get("shards", Document.class);
long maxCount = 0, minCount = Long.MAX_VALUE;
for (String shardName : shards.keySet()) {
Document shardStats = shards.get(shardName, Document.class);
long count = shardStats.getLong("count");
maxCount = Math.max(maxCount, count);
minCount = Math.min(minCount, count);
}
double ratio = (double) maxCount / minCount;
if (ratio > 1.5) {
alertImbalance(ratio, maxCount, minCount);
}
}
The Proof
After switching to {sensorId: 1, bucketStart: 1} with pre-splitting:
| Metric | Hashed sensorId | Compound range + pre-split |
|---|---|---|
| Upsert errors/hour | 15-30 | 0 |
| Max/min shard ratio | 1.09x | 1.35x |
| Single-sensor 24h query | 1 shard, 4.5ms | 1 shard, 3.8ms |
| Cross-sensor analytics | Scatter (all shards) | Scatter (all shards) |
| Bucket upsert latency | 12ms | 10ms |
The Trade-off
The compound range key has slightly worse distribution (1.35x vs 1.09x) because high-frequency sensors create more data in their shard key range. This is acceptable for the telemetry platform because no single sensor dominates the workload. If a single sensor produced 30% of all data, its shard would become a hot spot.
Pre-splitting requires knowledge of the shard key value distribution before insertion. For the telemetry platform with predictable sensorId ranges, this is straightforward. For workloads with unpredictable key distributions (user-generated content, random IDs), pre-splitting is less effective and the auto-balancer must handle distribution after the fact.
MongoDB 5.0+ supports resharding, which allows changing the shard key of an existing collection. This takes hours for large collections and doubles storage temporarily (MongoDB creates a new sharded collection and copies data). Plan the shard key correctly from the start.