Ticket Count Tuning and Write-Heavy Workload Balancing
Ticket Count Tuning and Write-Heavy Workload Balancing
The Symptom
The telemetry platform runs a mixed workload: 70% reads (dashboard queries) and 30% writes (sensor ingestion). During peak ingestion (batch imports from offline sensors), the write ratio jumps to 80%. During these batch imports, read ticket availability drops even though reads are using fewer tickets than writes.
Checking ticket utilization during batch import:
db.serverStatus().wiredTiger.concurrentTransactions
{
"write": { "out": 125, "available": 3, "totalTickets": 128 },
"read": { "out": 40, "available": 88, "totalTickets": 128 }
}
Write tickets are nearly exhausted (125 out of 128), but read tickets have ample headroom (40 out of 128). The 3 available write tickets mean new write operations wait in a queue. The batch import’s write throughput is limited by the write ticket pool.
The Cause
The default 128/128 split assumes a balanced workload. During batch imports, 128 write tickets are insufficient. Each bulk write operation holds a write ticket while WiredTiger processes the write, updates indexes, and journals the change. At the ingestion rate of 5,000 writes/sec during batch import, with each write taking 3ms, the required concurrent write capacity is:
$$\text{requiredWriteTickets} = 5000 \times 0.003 = 15$$
Fifteen tickets should be sufficient. But the batch import uses bulkWrite with batches of 10,000 documents. Each bulkWrite holds a single write ticket for the entire batch duration (200ms). With 20 concurrent batch writers:
$$\text{ticketsInUse} = 20 \times 1 = 20$$
But each bulkWrite call internally processes sub-batches, and WiredTiger checkpoint and eviction threads also consume write tickets. The effective ticket consumption during batch import includes internal operations.
The Benchmark
Test different ticket configurations under mixed workload:
// k6 test: mixed workload with varying ticket configurations
// Run separately against MongoDB instances configured with different ticket counts
export const options = {
scenarios: {
reads: {
executor: 'constant-arrival-rate',
rate: 500,
timeUnit: '1s',
duration: '3m',
preAllocatedVUs: 50,
exec: 'dashboardRead',
},
writes: {
executor: 'constant-arrival-rate',
rate: 5000,
timeUnit: '1s',
duration: '3m',
preAllocatedVUs: 200,
exec: 'batchIngest',
},
},
};
Results at different ticket configurations:
| Configuration | Read p99 | Write p99 | Write throughput |
|---|---|---|---|
| R:128 / W:128 (default) | 25ms | 180ms | 3,200 ops/s |
| R:128 / W:256 | 28ms | 45ms | 4,800 ops/s |
| R:64 / W:256 | 35ms | 38ms | 5,000 ops/s |
| R:256 / W:256 | 22ms | 42ms | 4,900 ops/s |
The Fix
For the write-heavy batch import scenario, increase write tickets:
# mongod.conf - asymmetric ticket configuration for write-heavy workload
setParameter:
wiredTigerConcurrentReadTransactions: 128
wiredTigerConcurrentWriteTransactions: 256
This can be changed at runtime without restart:
db.adminCommand({ setParameter: 1, wiredTigerConcurrentWriteTransactions: 256 })
For mixed workloads, consider a dynamic approach: increase write tickets during batch import windows and revert after:
// FAST: Dynamic ticket adjustment for batch import
public void startBatchImport() {
database.runCommand(new Document("setParameter", 1)
.append("wiredTigerConcurrentWriteTransactions", 256));
}
public void endBatchImport() {
database.runCommand(new Document("setParameter", 1)
.append("wiredTigerConcurrentWriteTransactions", 128));
}
The Proof
After increasing write tickets to 256 during batch import:
| Metric | W:128 | W:256 |
|---|---|---|
| Batch import duration | 45 min | 28 min |
| Write p99 during import | 180ms | 45ms |
| Read p99 during import | 25ms | 28ms |
| Write ticket exhaustion events | 2,300/min | 0/min |
The Trade-off
More write tickets mean more concurrent writes accessing WiredTiger simultaneously. This increases contention on internal WiredTiger structures: the B-tree page split lock, the eviction lock, and the cache management structures. On a 4-core server, 256 concurrent write threads cause excessive context switching. On a 16-core server with fast NVMe storage, 256 tickets are sustainable.
The guideline: total tickets (read + write) should not exceed 10x the CPU core count. On a 16-core server: max 160 tickets total. On a 32-core server: max 320. Beyond this, the overhead of context switching and lock contention outweighs the benefit of additional concurrency.
MongoDB 5.0+ introduced an adaptive ticket mechanism (throughputProbing) that automatically adjusts ticket counts based on observed throughput. When enabled, it monitors operation latency and adjusts tickets up or down to maximize throughput. This is disabled by default but recommended for workloads with variable read/write ratios:
# mongod.conf - enable adaptive ticket control
setParameter:
storageEngineConcurrencyAdjustmentAlgorithm: "throughputProbing"