Chunk Migrations: The Hidden I/O Tax
Chunk Migrations
When the MongoDB balancer detects that one shard has significantly more chunks than another, it migrates chunks from the heavy shard to the light shard. Each migration involves reading all documents in the chunk from the source shard, transferring them over the network, writing them to the target shard, and updating the config server.
A chunk migration is an I/O-intensive operation on both the source and target shards. During migration, the source shard reads documents from disk (consuming read I/O and WiredTiger cache), and the target shard writes documents and builds indexes (consuming write I/O and cache). Both operations compete with normal query traffic.
Migration I/O Impact
A single 128 MB chunk migration involves:
- Source shard: Reading 128 MB of documents from storage. If the documents are in the WiredTiger cache, this is fast but evicts other data. If they are on disk, this adds disk I/O.
- Network: Transferring 128 MB between shards. On a 10 Gbps network, this takes approximately 0.1 seconds. On a 1 Gbps network: 1 second.
- Target shard: Writing 128 MB of documents plus rebuilding all indexes for those documents. Index builds are I/O-intensive.
- Cleanup: After migration, the source shard deletes the migrated documents (range deletion). This delete operation holds write locks.
With the default balancer settings, the balancer can run multiple concurrent migrations (one per shard pair). On a 4-shard cluster, up to 2 concurrent migrations can occur.
// Check balancer state
sh.getBalancerState() // true = enabled
sh.isBalancerRunning() // true = currently migrating
sh.getBalancerLock() // who holds the balancer lock
// Check recent migration history
use config
db.changelog.find({what: "moveChunk.commit"}).sort({time: -1}).limit(5)