Skip to main content
unbound mongodb at scale

Serialization Boundaries: BSON Types, Network Payloads, and Cursor Batch Sizing

3 min read Chapter 16 of 72

Serialization Boundaries

Every byte stored in MongoDB is serialized as BSON. Every byte transferred between the driver and the server crosses the wire as BSON. The choice of BSON type for each field determines storage size, index size, network payload, and deserialization cost. A String storing a UUID is 41 bytes. A BinData subtype 4 storing the same UUID is 20 bytes. Across 100 million documents, that difference is 2 GB of storage and proportional savings in index size and network transfer.

BSON type sizes comparison showing storage overhead for common types: ObjectId (12B), UUID as BinData (20B), UUID as String (41B), Date (8B), ISO date as String (27B), Int32 (4B), Int64 (8B), Double (8B), number as String (variable). Network payload impact at 100K documents.

BSON Type Selection Rules

Identifiers: Use ObjectId (12 bytes) or BinData subtype 4 (20 bytes) for UUIDs. Never store UUIDs as strings (41 bytes). The MongoDB Java driver’s UUID codec handles this transparently when configured:

// FAST: UUID stored as BinData subtype 4
CodecRegistry codecRegistry = CodecRegistries.fromRegistries(
    CodecRegistries.fromCodecs(new UuidCodec(UuidRepresentation.STANDARD)),
    MongoClientSettings.getDefaultCodecRegistry()
);

Timestamps: Use BSON DateTime (8 bytes) instead of ISO 8601 strings (27 bytes). DateTime is a 64-bit millisecond epoch value that supports range queries with index scans. String timestamps require lexicographic comparison, which works for ISO 8601 format but is slower than numeric comparison.

// SLOW: Timestamp as string
document.append("timestamp", Instant.now().toString());   // 27 bytes, string comparison

// FAST: Timestamp as BSON DateTime
document.append("timestamp", new Date());                 // 8 bytes, numeric comparison

Numeric fields: Use the smallest type that fits. BSON Int32 is 4 bytes, Int64 is 8 bytes, Double is 8 bytes, Decimal128 is 16 bytes. A temperature reading between -50 and 150 fits in an Int32 if stored as tenths (multiply by 10). If fractional precision matters, use Double.

Boolean flags: BSON Boolean is 1 byte. Storing "true" as a string is 8 bytes. For a collection with 10 boolean fields across 100 million documents, that is 7 GB of wasted storage.

Cursor Batch Sizing

The MongoDB server sends query results in batches. The default first batch size is 101 documents or 16 MB, whichever comes first. Subsequent batches default to 16 MB. The batchSize parameter on the cursor controls how many documents the server sends per batch.

Small batches increase round trips. Large batches increase memory consumption and latency to first result. The optimal batch size depends on the use case.

// Streaming large result sets: small batches for low memory
collection.find(Filters.gte("timestamp", cutoff))
    .batchSize(100)
    .forEach(doc -> processAndDiscard(doc));

// Dashboard query: large batch to minimize round trips
List<Document> results = collection.find(Filters.eq("sensorId", sensorId))
    .sort(Sorts.descending("timestamp"))
    .limit(500)
    .batchSize(500)    // Single round trip
    .into(new ArrayList<>());