Detecting and Fixing Hot Shards

The Symptom

After sharding the telemetry collection with {sensorId: 1, ts: 1}, the team notices that shard 3 has 40% of the data while shards 1, 2, and 4 have roughly 20% each. Shard 3’s CPU runs at 75% while other shards run at 35%. Write latency on shard 3 is 3x higher than on other shards.

The Cause

The sensorId distribution is not uniform. Sensors in Building HQ (IDs “sensor-00001” through “sensor-04000”) produce more data than other sensors because they report every 2 seconds instead of every 5 seconds. The range-based shard key {sensorId: 1, ts: 1} allocates contiguous ranges of sensorIds to each shard. If shard 3 owns the “sensor-02000” to “sensor-04000” range, it holds the highest-frequency sensors.

// Check chunk distribution
sh.status()

// Output shows uneven distribution:
// shard-1: 245 chunks
// shard-2: 248 chunks
// shard-3: 612 chunks    <- 2.5x more chunks
// shard-4: 251 chunks

// Check for jumbo chunks
db.readings.getShardDistribution()

Jumbo chunks are chunks that exceed the maximum chunk size (default 128 MB) but cannot be split because they contain only one shard key value. If sensor-00001 has 500 MB of data, the chunk containing {sensorId: "sensor-00001"} is jumbo and cannot be migrated to another shard.

// Find jumbo chunks
use config
db.chunks.find({ jumbo: true, ns: "telemetry.readings" }).count()

The Benchmark

Compare write distribution across shards with different shard key strategies:

Shard key	Shard 1 writes	Shard 2 writes	Shard 3 writes	Shard 4 writes	Max/Min ratio
`{sensorId: 1, ts: 1}` (range)	450/s	480/s	1,200/s	470/s	2.67x
`{sensorId: "hashed"}`	510/s	495/s	520/s	475/s	1.09x
`{sensorId: 1, ts: 1}` (after rebalance)	520/s	530/s	680/s	470/s	1.45x

Hashed shard key on sensorId achieves near-perfect distribution because the hash function randomizes the mapping of sensorIds to shards.

The Fix

Option 1: Use a hashed shard key (for new collections or resharding).

// Hashed shard key for even distribution
database.runCommand(new Document("shardCollection", "telemetry.readings_v2")
    .append("key", new Document("sensorId", "hashed")));

Hashed shard keys distribute writes evenly but sacrifice range query efficiency. A query for {sensorId: "sensor-00042", ts: {$gte: start}} can still target a single shard (the hash of “sensor-00042” maps to one shard). But a query for {sensorId: {$in: ["sensor-00042", "sensor-00043"]}} may scatter to multiple shards because the two sensorIds hash to different shards (even though they are adjacent).

Option 2: Manual chunk splitting and migration.

// Split a jumbo chunk manually
sh.splitAt("telemetry.readings", {
  sensorId: "sensor-03000",
  ts: ISODate("2024-06-01T00:00:00Z")
})

// Move a chunk to a less loaded shard
sh.moveChunk("telemetry.readings", {
  sensorId: "sensor-03000",
  ts: ISODate("2024-06-01T00:00:00Z")
}, "shard-1")

Option 3: Add a distribution prefix.

If resharding is not possible, add a computed field that distributes high-frequency sensors:

// FAST: Distribution prefix for hot sensorIds
String distributionKey = sensorId + "-" + (reading.hashCode() % 4);

Document doc = new Document()
    .append("shardKey", distributionKey)  // "sensor-00001-0", "sensor-00001-1", etc.
    .append("sensorId", sensorId)
    .append("ts", Date.from(timestamp))
    .append("temperature", temperature);

This spreads sensor-00001’s data across 4 sub-keys, preventing jumbo chunks. Queries must adjust to include the prefix in the filter.

The Proof

After resharding with hashed shard key:

Metric	Range key (before)	Hashed key (after)
Max/min shard data ratio	2.67x	1.09x
Shard 3 CPU	75%	38%
Write p99 (worst shard)	45ms	18ms
Scatter-gather queries	Rare	More frequent for range queries

The Trade-off

Hashed shard keys achieve write distribution at the cost of range query efficiency. A range query {sensorId: {$gte: "sensor-00100", $lte: "sensor-00200"}} must scatter to all shards with a hashed key because adjacent sensorIds hash to different shards. With a range key, this query targets 1-2 shards.

The telemetry platform primarily queries by exact sensorId (equality), which targets a single shard regardless of whether the key is hashed or ranged. For this workload, hashed keys provide better distribution with no query performance penalty for the dominant access pattern. Range queries across sensors (the analytics use case) scatter in both cases.