Diagnosing Migration-Induced Latency Spikes
Diagnosing Migration-Induced Latency Spikes
The Symptom
The telemetry platform’s read latency shows periodic spikes: p99 jumps from 15ms to 120ms every 15-20 minutes, then returns to baseline. The spikes last 30-90 seconds. They are not correlated with traffic patterns or deployment events.
The Cause
Each spike coincides with a chunk migration. When the balancer moves a chunk from shard 2 to shard 4:
- Shard 2 reads the chunk’s documents from WiredTiger, evicting cached query data.
- Shard 4 writes the incoming documents, consuming write I/O bandwidth.
- After the migration completes, shard 2 performs range deletion (cleanup), which holds locks and generates I/O.
The range deletion phase (step 3) is the worst offender. MongoDB deletes migrated documents in batches, but each batch acquires a lock and generates write I/O that competes with normal operations.
// Identify active migrations
db.adminCommand({ currentOp: true, desc: /moveChunk|migrat/ })
// Check range deletion queue
db.adminCommand({ currentOp: true, desc: "RangeDeleter" })
// Output shows:
// {
// "desc": "RangeDeleter",
// "active": true,
// "ns": "telemetry.readings",
// "range": { "min": {"sensorId": "sensor-02000"}, "max": {"sensorId": "sensor-03000"} },
// "numDocsDeleted": 45000,
// "totalDocs": 180000
// }
Correlate migration timestamps with latency spikes:
// Migration history with timing
use config
db.changelog.find({
what: { $in: ["moveChunk.start", "moveChunk.commit"] },
time: { $gte: new Date(Date.now() - 3600000) } // last hour
}).sort({ time: 1 }).forEach(function(entry) {
print(entry.time.toISOString() + " " + entry.what + " " +
entry.details.from + " -> " + entry.details.to);
});
The Benchmark
Measure the read latency impact during migrations:
// SLOW: Default migration settings, no balancer window
// Migrations happen during peak traffic hours
// k6 results during migration:
// p50: 8ms -> 25ms (3.1x increase)
// p99: 15ms -> 120ms (8x increase)
// Duration: 30-90 seconds per migration
// Monitor per-shard latency during migration
MongoClient client = MongoClients.create(connectionString);
ServerDescription sourceDesc = client.getClusterDescription()
.getServerDescriptions().stream()
.filter(s -> s.getAddress().equals(sourceShardAddress))
.findFirst().orElseThrow();
| Metric | No migration | During migration (source) | During migration (target) |
|---|---|---|---|
| Read p50 | 5ms | 18ms | 7ms |
| Read p99 | 15ms | 120ms | 22ms |
| Write p50 | 8ms | 12ms | 35ms |
| WiredTiger cache evictions/s | 200 | 1,400 | 850 |
| Disk read IOPS | 3,000 | 8,500 | 3,200 |
The source shard suffers most on reads (cache eviction), and the target shard suffers on writes (incoming documents and index builds).
The Fix
Step 1: Set a balancer window to avoid peak traffic.
// Run migrations only during off-peak hours (2 AM - 5 AM UTC)
db.settings.updateOne(
{ _id: "balancer" },
{ $set: {
activeWindow: { start: "02:00", stop: "05:00" }
}},
{ upsert: true }
)
Step 2: Tune range deletion to reduce lock contention.
// Increase delay between range deletion batches
// Default: 20ms. Increase to 100ms to reduce I/O pressure.
db.adminCommand({
setParameter: 1,
rangeDeleterBatchDelayMS: 100
})
// Reduce batch size for range deletions
// Default: Unlimited. Set to 1000 to limit per-batch I/O.
db.adminCommand({
setParameter: 1,
rangeDeleterBatchSize: 1000
})
Step 3: Monitor migrations in the Java application.
// FAST: Connection pool listener that logs migration-related events
public class MigrationAwareCommandListener implements CommandListener {
private final MeterRegistry registry;
@Override
public void commandSucceeded(CommandSucceededEvent event) {
if (event.getElapsedTime(TimeUnit.MILLISECONDS) > 50) {
// Correlate slow commands with migration windows
registry.counter("mongo.slow_commands",
"command", event.getCommandName(),
"server", event.getConnectionDescription()
.getServerAddress().toString()
).increment();
}
}
}
The Proof
After applying balancer window and range deletion tuning:
| Metric | Before (default) | After (tuned) |
|---|---|---|
| Migration-induced p99 spike | 120ms | 45ms |
| Spike frequency during peak | Every 15-20 min | Zero (migrations only 2-5 AM) |
| Range deletion duration | 30s | 90s (slower but less impactful) |
| Cache evictions during migration | 1,400/s | 800/s |
The Trade-off
Restricting the balancer window means that chunk imbalance accumulates during peak hours. If the cluster receives 500 GB of new data during the 21-hour peak window, the balancer has only 3 hours to redistribute. For rapidly growing collections, this may not be enough time to rebalance.
Increasing range deletion batch delay reduces the I/O impact but extends the time that orphaned documents exist on the source shard. During this window, a query that targets the source shard may see stale data if it reads from a secondary that has not yet replicated the deletions. For the telemetry platform, where readings are append-only and queries use readPreference: primary, this is acceptable.