Serialization Boundaries: BSON Types, Network Payloads, and Cursor Batch Sizing
Serialization Boundaries
Every byte stored in MongoDB is serialized as BSON. Every byte transferred between the driver and the server crosses the wire as BSON. The choice of BSON type for each field determines storage size, index size, network payload, and deserialization cost. A String storing a UUID is 41 bytes. A BinData subtype 4 storing the same UUID is 20 bytes. Across 100 million documents, that difference is 2 GB of storage and proportional savings in index size and network transfer.
BSON Type Selection Rules
Identifiers: Use ObjectId (12 bytes) or BinData subtype 4 (20 bytes) for UUIDs. Never store UUIDs as strings (41 bytes). The MongoDB Java driver’s UUID codec handles this transparently when configured:
// FAST: UUID stored as BinData subtype 4
CodecRegistry codecRegistry = CodecRegistries.fromRegistries(
CodecRegistries.fromCodecs(new UuidCodec(UuidRepresentation.STANDARD)),
MongoClientSettings.getDefaultCodecRegistry()
);
Timestamps: Use BSON DateTime (8 bytes) instead of ISO 8601 strings (27 bytes). DateTime is a 64-bit millisecond epoch value that supports range queries with index scans. String timestamps require lexicographic comparison, which works for ISO 8601 format but is slower than numeric comparison.
// SLOW: Timestamp as string
document.append("timestamp", Instant.now().toString()); // 27 bytes, string comparison
// FAST: Timestamp as BSON DateTime
document.append("timestamp", new Date()); // 8 bytes, numeric comparison
Numeric fields: Use the smallest type that fits. BSON Int32 is 4 bytes, Int64 is 8 bytes, Double is 8 bytes, Decimal128 is 16 bytes. A temperature reading between -50 and 150 fits in an Int32 if stored as tenths (multiply by 10). If fractional precision matters, use Double.
Boolean flags: BSON Boolean is 1 byte. Storing "true" as a string is 8 bytes. For a collection with 10 boolean fields across 100 million documents, that is 7 GB of wasted storage.
Cursor Batch Sizing
The MongoDB server sends query results in batches. The default first batch size is 101 documents or 16 MB, whichever comes first. Subsequent batches default to 16 MB. The batchSize parameter on the cursor controls how many documents the server sends per batch.
Small batches increase round trips. Large batches increase memory consumption and latency to first result. The optimal batch size depends on the use case.
// Streaming large result sets: small batches for low memory
collection.find(Filters.gte("timestamp", cutoff))
.batchSize(100)
.forEach(doc -> processAndDiscard(doc));
// Dashboard query: large batch to minimize round trips
List<Document> results = collection.find(Filters.eq("sensorId", sensorId))
.sort(Sorts.descending("timestamp"))
.limit(500)
.batchSize(500) // Single round trip
.into(new ArrayList<>());