Cursor Batch Sizing: Round Trips vs Memory

The Symptom

The telemetry analytics job reads 500,000 documents per run to compute hourly aggregates. Each run takes 45 seconds. Network monitoring shows 5,000 round trips between the application and MongoDB during the run. The average round trip time is 1.2ms (same datacenter). 5,000 round trips contribute 6 seconds of pure network latency, 13% of the total run time.

The Cause

The default batch behavior: the first batch returns up to 101 documents. Subsequent batches fill up to 16 MB. For the telemetry collection with 340-byte documents (after BSON type optimization from CH6-S1), each subsequent batch contains approximately 47,000 documents. The first batch is the outlier.

But the analytics job uses a batchSize(100) setting that a developer added “to reduce memory usage.” With 100-document batches:

$$\text{roundTrips} = \lceil 500{,}000 / 100 \rceil = 5{,}000$$

At 1.2ms per round trip: $5{,}000 \times 1.2\text{ms} = 6\text{s}$ of network latency.

// SLOW: Tiny batch size maximizes round trips
try (MongoCursor<Document> cursor = collection.find(
    Filters.and(
        Filters.gte("timestamp", hourStart),
        Filters.lt("timestamp", hourEnd)
    )
).batchSize(100)   // 5,000 round trips for 500K docs
 .iterator()) {

    while (cursor.hasNext()) {
        aggregate(cursor.next());
    }
}

The Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 10)
@Measurement(iterations = 3, time = 30)
@Fork(1)
@State(Scope.Benchmark)
public class BatchSizeBenchmark {

    private MongoCollection<Document> collection;

    @Param({"100", "500", "1000", "5000", "10000"})
    private int batchSize;

    @Setup
    public void setup() {
        MongoClient client = MongoClients.create("mongodb://localhost:27017");
        collection = client.getDatabase("telemetry").getCollection("readings");
    }

    @Benchmark
    public long iterateWithBatchSize() {
        long count = 0;
        try (MongoCursor<Document> cursor = collection.find(
            Filters.and(
                Filters.gte("timestamp", Date.from(Instant.now().minus(1, ChronoUnit.HOURS))),
                Filters.lt("timestamp", Date.from(Instant.now()))
            )
        ).batchSize(batchSize).iterator()) {
            while (cursor.hasNext()) {
                cursor.next();
                count++;
            }
        }
        return count;
    }
}

Results for 500,000 documents:

Benchmark                           (batchSize)  Mode  Cnt      Score      Error  Units
BatchSizeBenchmark.iterateWithBatchSize      100  avgt    3  42000.000 ± 1200.000  ms/op
BatchSizeBenchmark.iterateWithBatchSize      500  avgt    3  12000.000 ±  800.000  ms/op
BatchSizeBenchmark.iterateWithBatchSize     1000  avgt    3   8500.000 ±  600.000  ms/op
BatchSizeBenchmark.iterateWithBatchSize     5000  avgt    3   6200.000 ±  400.000  ms/op
BatchSizeBenchmark.iterateWithBatchSize    10000  avgt    3   5800.000 ±  350.000  ms/op

The gains diminish after batch size 5,000. Going from 100 to 5,000 reduces total time by 85%. Going from 5,000 to 10,000 reduces it by only 6%.

Round trip analysis:

Batch size	Round trips	Network latency (1.2ms/trip)	Client memory per batch
100	5,000	6,000ms	34 KB
500	1,000	1,200ms	170 KB
1,000	500	600ms	340 KB
5,000	100	120ms	1.7 MB
10,000	50	60ms	3.4 MB

The Fix

// FAST: Batch size tuned for analytics workload
try (MongoCursor<Document> cursor = collection.find(
    Filters.and(
        Filters.gte("timestamp", hourStart),
        Filters.lt("timestamp", hourEnd)
    )
).batchSize(5000)   // 100 round trips for 500K docs
 .iterator()) {

    while (cursor.hasNext()) {
        aggregate(cursor.next());
    }
}

For the dashboard endpoint that returns 50-100 documents, set batchSize equal to the limit to fetch everything in a single round trip:

// FAST: Single round trip for bounded queries
List<Document> results = collection.find(Filters.eq("sensorId", sensorId))
    .sort(Sorts.descending("timestamp"))
    .limit(100)
    .batchSize(100)    // Match limit: one round trip
    .into(new ArrayList<>());

The Proof

After changing the analytics job from batchSize(100) to batchSize(5000):

Metric	batchSize=100	batchSize=5000
Total time	42s	6.2s
Network round trips	5,000	100
Network latency contribution	6,000ms	120ms
Peak client memory	34 KB	1.7 MB

The analytics job runs 6.8x faster. The memory cost is 1.7 MB per cursor, which is negligible for a batch job.

The Trade-off

Large batch sizes hold server-side resources longer. Each batch is assembled in MongoDB’s memory before sending. A batch of 5,000 documents at 340 bytes each is 1.7 MB of server memory per cursor. If 100 concurrent cursors each hold 5,000-document batches, that is 170 MB of server memory dedicated to cursor buffers. This comes from the WiredTiger cache or the operating system page cache, reducing memory available for the working set.

For streaming workloads (continuous data processing), keep batch sizes moderate (500-1,000). For batch analytics jobs that run sequentially, use large batch sizes (5,000-10,000). For API endpoints serving user requests, set batch size equal to the limit so each request completes in a single round trip.

Cross-region deployments amplify the batch size impact. If the application and MongoDB are in different regions with 30ms round trip time, the same 5,000 round trips cost 150 seconds of network latency instead of 6 seconds. In cross-region scenarios, maximize batch size aggressively.