Cursor Batch Sizing: Round Trips vs Memory
Cursor Batch Sizing: Round Trips vs Memory
The Symptom
The telemetry analytics job reads 500,000 documents per run to compute hourly aggregates. Each run takes 45 seconds. Network monitoring shows 5,000 round trips between the application and MongoDB during the run. The average round trip time is 1.2ms (same datacenter). 5,000 round trips contribute 6 seconds of pure network latency, 13% of the total run time.
The Cause
The default batch behavior: the first batch returns up to 101 documents. Subsequent batches fill up to 16 MB. For the telemetry collection with 340-byte documents (after BSON type optimization from CH6-S1), each subsequent batch contains approximately 47,000 documents. The first batch is the outlier.
But the analytics job uses a batchSize(100) setting that a developer added “to reduce memory usage.” With 100-document batches:
$$\text{roundTrips} = \lceil 500{,}000 / 100 \rceil = 5{,}000$$
At 1.2ms per round trip: $5{,}000 \times 1.2\text{ms} = 6\text{s}$ of network latency.
// SLOW: Tiny batch size maximizes round trips
try (MongoCursor<Document> cursor = collection.find(
Filters.and(
Filters.gte("timestamp", hourStart),
Filters.lt("timestamp", hourEnd)
)
).batchSize(100) // 5,000 round trips for 500K docs
.iterator()) {
while (cursor.hasNext()) {
aggregate(cursor.next());
}
}
The Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 10)
@Measurement(iterations = 3, time = 30)
@Fork(1)
@State(Scope.Benchmark)
public class BatchSizeBenchmark {
private MongoCollection<Document> collection;
@Param({"100", "500", "1000", "5000", "10000"})
private int batchSize;
@Setup
public void setup() {
MongoClient client = MongoClients.create("mongodb://localhost:27017");
collection = client.getDatabase("telemetry").getCollection("readings");
}
@Benchmark
public long iterateWithBatchSize() {
long count = 0;
try (MongoCursor<Document> cursor = collection.find(
Filters.and(
Filters.gte("timestamp", Date.from(Instant.now().minus(1, ChronoUnit.HOURS))),
Filters.lt("timestamp", Date.from(Instant.now()))
)
).batchSize(batchSize).iterator()) {
while (cursor.hasNext()) {
cursor.next();
count++;
}
}
return count;
}
}
Results for 500,000 documents:
Benchmark (batchSize) Mode Cnt Score Error Units
BatchSizeBenchmark.iterateWithBatchSize 100 avgt 3 42000.000 ± 1200.000 ms/op
BatchSizeBenchmark.iterateWithBatchSize 500 avgt 3 12000.000 ± 800.000 ms/op
BatchSizeBenchmark.iterateWithBatchSize 1000 avgt 3 8500.000 ± 600.000 ms/op
BatchSizeBenchmark.iterateWithBatchSize 5000 avgt 3 6200.000 ± 400.000 ms/op
BatchSizeBenchmark.iterateWithBatchSize 10000 avgt 3 5800.000 ± 350.000 ms/op
The gains diminish after batch size 5,000. Going from 100 to 5,000 reduces total time by 85%. Going from 5,000 to 10,000 reduces it by only 6%.
Round trip analysis:
| Batch size | Round trips | Network latency (1.2ms/trip) | Client memory per batch |
|---|---|---|---|
| 100 | 5,000 | 6,000ms | 34 KB |
| 500 | 1,000 | 1,200ms | 170 KB |
| 1,000 | 500 | 600ms | 340 KB |
| 5,000 | 100 | 120ms | 1.7 MB |
| 10,000 | 50 | 60ms | 3.4 MB |
The Fix
// FAST: Batch size tuned for analytics workload
try (MongoCursor<Document> cursor = collection.find(
Filters.and(
Filters.gte("timestamp", hourStart),
Filters.lt("timestamp", hourEnd)
)
).batchSize(5000) // 100 round trips for 500K docs
.iterator()) {
while (cursor.hasNext()) {
aggregate(cursor.next());
}
}
For the dashboard endpoint that returns 50-100 documents, set batchSize equal to the limit to fetch everything in a single round trip:
// FAST: Single round trip for bounded queries
List<Document> results = collection.find(Filters.eq("sensorId", sensorId))
.sort(Sorts.descending("timestamp"))
.limit(100)
.batchSize(100) // Match limit: one round trip
.into(new ArrayList<>());
The Proof
After changing the analytics job from batchSize(100) to batchSize(5000):
| Metric | batchSize=100 | batchSize=5000 |
|---|---|---|
| Total time | 42s | 6.2s |
| Network round trips | 5,000 | 100 |
| Network latency contribution | 6,000ms | 120ms |
| Peak client memory | 34 KB | 1.7 MB |
The analytics job runs 6.8x faster. The memory cost is 1.7 MB per cursor, which is negligible for a batch job.
The Trade-off
Large batch sizes hold server-side resources longer. Each batch is assembled in MongoDB’s memory before sending. A batch of 5,000 documents at 340 bytes each is 1.7 MB of server memory per cursor. If 100 concurrent cursors each hold 5,000-document batches, that is 170 MB of server memory dedicated to cursor buffers. This comes from the WiredTiger cache or the operating system page cache, reducing memory available for the working set.
For streaming workloads (continuous data processing), keep batch sizes moderate (500-1,000). For batch analytics jobs that run sequentially, use large batch sizes (5,000-10,000). For API endpoints serving user requests, set batch size equal to the limit so each request completes in a single round trip.
Cross-region deployments amplify the batch size impact. If the application and MongoDB are in different regions with 30ms round trip time, the same 5,000 round trips cost 150 seconds of network latency instead of 6 seconds. In cross-region scenarios, maximize batch size aggressively.