BSON Allocation Profiling and Custom Codecs for Hot Paths

The Symptom

The telemetry ingestion service exhibits 150ms GC pauses every 30 seconds under sustained 10,000 reads/sec load. The pauses correlate exactly with young generation collections. Async-profiler flame graphs show 62% of allocations originating from org.bson.codecs.DocumentCodec.decode().

The Cause

The default DocumentCodec creates a Document (backed by LinkedHashMap) for every result. Each entry in the map requires a Map.Entry object, a boxed key (String), and a boxed value. For numeric fields, double primitives become Double objects. For dates, epoch milliseconds become Date objects. None of this boxing is necessary if you control the deserialization.

The Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 5, time = 10)
@Fork(value = 1, jvmArgs = {"-Xmx2g", "-XX:+UseG1GC"})
@State(Scope.Benchmark)
public class BsonDeserializationBenchmark {

    private MongoCollection<Document> documentCollection;
    private MongoCollection<TelemetryReading> codecCollection;
    private Bson filter;

    @Setup
    public void setup() {
        MongoClient client = MongoClients.create("mongodb://localhost:27017");
        MongoDatabase db = client.getDatabase("telemetry");

        documentCollection = db.getCollection("readings");

        CodecRegistry codecRegistry = CodecRegistries.fromRegistries(
            CodecRegistries.fromCodecs(new TelemetryReadingCodec()),
            MongoClientSettings.getDefaultCodecRegistry()
        );
        codecCollection = db.getCollection("readings", TelemetryReading.class)
            .withCodecRegistry(codecRegistry);

        filter = Filters.eq("sensorId", "sensor-00042");
    }

    @Benchmark
    public List<Document> defaultDocumentCodec() {
        return documentCollection.find(filter).limit(100).into(new ArrayList<>());
    }

    @Benchmark
    public List<TelemetryReading> customCodec() {
        return codecCollection.find(filter).limit(100).into(new ArrayList<>());
    }
}

The custom codec avoids the LinkedHashMap entirely:

// FAST: Custom Codec that deserializes directly to a record
public class TelemetryReadingCodec implements Codec<TelemetryReading> {

    @Override
    public TelemetryReading decode(BsonReader reader, DecoderContext context) {
        reader.readStartDocument();

        String sensorId = null;
        long timestamp = 0;
        double temperature = 0;
        double humidity = 0;
        double pressure = 0;

        while (reader.readBsonType() != BsonType.END_OF_DOCUMENT) {
            String fieldName = reader.readName();
            switch (fieldName) {
                case "sensorId" -> sensorId = reader.readString();
                case "ts" -> timestamp = reader.readDateTime();
                case "temp" -> temperature = reader.readDouble();
                case "humidity" -> humidity = reader.readDouble();
                case "pressure" -> pressure = reader.readDouble();
                default -> reader.skipValue();
            }
        }
        reader.readEndDocument();

        return new TelemetryReading(
            sensorId,
            Instant.ofEpochMilli(timestamp),
            temperature,
            humidity,
            pressure
        );
    }

    @Override
    public void encode(BsonWriter writer, TelemetryReading value, EncoderContext context) {
        writer.writeStartDocument();
        writer.writeString("sensorId", value.sensorId());
        writer.writeDateTime("ts", value.timestamp().toEpochMilli());
        writer.writeDouble("temp", value.temperature());
        writer.writeDouble("humidity", value.humidity());
        writer.writeDouble("pressure", value.pressure());
        writer.writeEndDocument();
    }

    @Override
    public Class<TelemetryReading> getEncoderClass() {
        return TelemetryReading.class;
    }
}

// The POJO: a Java record with primitive-friendly types
public record TelemetryReading(
    String sensorId,
    Instant timestamp,
    double temperature,
    double humidity,
    double pressure
) {}

JMH results:

Benchmark                                        Mode  Cnt    Score    Error  Units
BsonDeserializationBenchmark.defaultDocumentCodec avgt    5  427.000 ± 18.000  us/op
BsonDeserializationBenchmark.customCodec          avgt    5  112.000 ±  5.000  us/op

The Fix

// FAST: Register custom codec for the telemetry collection
CodecRegistry registry = CodecRegistries.fromRegistries(
    CodecRegistries.fromCodecs(new TelemetryReadingCodec()),
    MongoClientSettings.getDefaultCodecRegistry()
);

MongoCollection<TelemetryReading> readings = database
    .getCollection("readings", TelemetryReading.class)
    .withCodecRegistry(registry);

// Reads now deserialize directly to TelemetryReading records
List<TelemetryReading> results = readings.find(
    Filters.eq("sensorId", "sensor-00042")
).limit(100).into(new ArrayList<>());

The Proof

Metric	Document codec	Custom codec
Deserialization time (100 docs)	427 μs	112 μs
Heap allocation per doc	736 bytes	184 bytes
Allocation rate (10K docs/sec)	7.2 MB/sec	1.8 MB/sec
GC pause frequency	Every 30s	Every 120s
GC pause duration (p99)	150ms	35ms

The custom codec is 3.8x faster and allocates 4x less memory per document. GC pauses dropped from 150ms every 30 seconds to 35ms every 2 minutes.

The Trade-off

Custom codecs require manual maintenance. When the document schema changes (a new field is added, a field type changes), the codec must be updated. The default -> reader.skipValue() case in the switch statement provides forward compatibility for unknown fields, but removed or renamed fields require a codec update. This is engineering effort that the default DocumentCodec handles automatically. Use custom codecs only on hot paths where the allocation reduction is measurable. For low-throughput endpoints, the default codec is sufficient.