Detecting and Fixing Hot Shards
Detecting and Fixing Hot Shards
The Symptom
After sharding the telemetry collection with {sensorId: 1, ts: 1}, the team notices that shard 3 has 40% of the data while shards 1, 2, and 4 have roughly 20% each. Shard 3’s CPU runs at 75% while other shards run at 35%. Write latency on shard 3 is 3x higher than on other shards.
The Cause
The sensorId distribution is not uniform. Sensors in Building HQ (IDs “sensor-00001” through “sensor-04000”) produce more data than other sensors because they report every 2 seconds instead of every 5 seconds. The range-based shard key {sensorId: 1, ts: 1} allocates contiguous ranges of sensorIds to each shard. If shard 3 owns the “sensor-02000” to “sensor-04000” range, it holds the highest-frequency sensors.
// Check chunk distribution
sh.status()
// Output shows uneven distribution:
// shard-1: 245 chunks
// shard-2: 248 chunks
// shard-3: 612 chunks <- 2.5x more chunks
// shard-4: 251 chunks
// Check for jumbo chunks
db.readings.getShardDistribution()
Jumbo chunks are chunks that exceed the maximum chunk size (default 128 MB) but cannot be split because they contain only one shard key value. If sensor-00001 has 500 MB of data, the chunk containing {sensorId: "sensor-00001"} is jumbo and cannot be migrated to another shard.
// Find jumbo chunks
use config
db.chunks.find({ jumbo: true, ns: "telemetry.readings" }).count()
The Benchmark
Compare write distribution across shards with different shard key strategies:
| Shard key | Shard 1 writes | Shard 2 writes | Shard 3 writes | Shard 4 writes | Max/Min ratio |
|---|---|---|---|---|---|
{sensorId: 1, ts: 1} (range) | 450/s | 480/s | 1,200/s | 470/s | 2.67x |
{sensorId: "hashed"} | 510/s | 495/s | 520/s | 475/s | 1.09x |
{sensorId: 1, ts: 1} (after rebalance) | 520/s | 530/s | 680/s | 470/s | 1.45x |
Hashed shard key on sensorId achieves near-perfect distribution because the hash function randomizes the mapping of sensorIds to shards.
The Fix
Option 1: Use a hashed shard key (for new collections or resharding).
// Hashed shard key for even distribution
database.runCommand(new Document("shardCollection", "telemetry.readings_v2")
.append("key", new Document("sensorId", "hashed")));
Hashed shard keys distribute writes evenly but sacrifice range query efficiency. A query for {sensorId: "sensor-00042", ts: {$gte: start}} can still target a single shard (the hash of “sensor-00042” maps to one shard). But a query for {sensorId: {$in: ["sensor-00042", "sensor-00043"]}} may scatter to multiple shards because the two sensorIds hash to different shards (even though they are adjacent).
Option 2: Manual chunk splitting and migration.
// Split a jumbo chunk manually
sh.splitAt("telemetry.readings", {
sensorId: "sensor-03000",
ts: ISODate("2024-06-01T00:00:00Z")
})
// Move a chunk to a less loaded shard
sh.moveChunk("telemetry.readings", {
sensorId: "sensor-03000",
ts: ISODate("2024-06-01T00:00:00Z")
}, "shard-1")
Option 3: Add a distribution prefix.
If resharding is not possible, add a computed field that distributes high-frequency sensors:
// FAST: Distribution prefix for hot sensorIds
String distributionKey = sensorId + "-" + (reading.hashCode() % 4);
Document doc = new Document()
.append("shardKey", distributionKey) // "sensor-00001-0", "sensor-00001-1", etc.
.append("sensorId", sensorId)
.append("ts", Date.from(timestamp))
.append("temperature", temperature);
This spreads sensor-00001’s data across 4 sub-keys, preventing jumbo chunks. Queries must adjust to include the prefix in the filter.
The Proof
After resharding with hashed shard key:
| Metric | Range key (before) | Hashed key (after) |
|---|---|---|
| Max/min shard data ratio | 2.67x | 1.09x |
| Shard 3 CPU | 75% | 38% |
| Write p99 (worst shard) | 45ms | 18ms |
| Scatter-gather queries | Rare | More frequent for range queries |
The Trade-off
Hashed shard keys achieve write distribution at the cost of range query efficiency. A range query {sensorId: {$gte: "sensor-00100", $lte: "sensor-00200"}} must scatter to all shards with a hashed key because adjacent sensorIds hash to different shards. With a range key, this query targets 1-2 shards.
The telemetry platform primarily queries by exact sensorId (equality), which targets a single shard regardless of whether the key is hashed or ranged. For this workload, hashed keys provide better distribution with no query performance penalty for the dominant access pattern. Range queries across sensors (the analytics use case) scatter in both cases.