The Bucket Pattern

Storing one document per sensor reading is the natural design. It is also the wrong design for the telemetry platform at scale.

At 10,000 sensors reporting every 5 seconds, the system ingests 2,000 readings per second. Per day: 172.8 million documents. Per month: 5.18 billion documents. Each document carries BSON overhead: the _id field (12 bytes), field names repeated in every document (approximately 80 bytes), and structural bytes (type markers, null terminators). For a reading with 4 numeric values, the payload is 32 bytes but the document is 200 bytes. Overhead ratio: 5.25x.

The bucket pattern groups multiple readings into a single document, amortizing the per-document overhead across many measurements.

Bucket pattern comparison: per-event model (1 doc = 1 reading, 200 bytes each) vs hourly bucket (1 doc = 720 readings, 28KB, 39 bytes per reading effective). Shows 5.1x storage reduction and 720x fewer documents.

Per-Event Model

// Per-event: one document per reading
{
  _id: ObjectId("..."),
  sensorId: "sensor-00042",
  ts: ISODate("2024-01-15T10:30:05Z"),
  temperature: 23.5,
  humidity: 65.2,
  pressure: 1013.25,
  voltage: 3.28
}

200 bytes per document. 172.8M documents per day. 34.5 GB per day.

Bucket Model

// Bucketed: one document per sensor per hour
{
  _id: ObjectId("..."),
  sensorId: "sensor-00042",
  bucketStart: ISODate("2024-01-15T10:00:00Z"),
  bucketEnd: ISODate("2024-01-15T10:59:55Z"),
  count: 720,
  measurements: [
    { ts: ISODate("2024-01-15T10:00:00Z"), t: 23.1, h: 64.8, p: 1013.10, v: 3.30 },
    { ts: ISODate("2024-01-15T10:00:05Z"), t: 23.2, h: 64.9, p: 1013.12, v: 3.29 },
    // ... 718 more entries
  ],
  summary: {
    temperature: { min: 22.8, max: 24.1, avg: 23.45 },
    humidity: { min: 63.2, max: 67.1, avg: 65.15 },
    pressure: { min: 1012.80, max: 1013.60, avg: 1013.20 },
    voltage: { min: 3.25, max: 3.32, avg: 3.28 }
  }
}

28 KB per bucket document. 240,000 documents per day (10,000 sensors x 24 hours). 6.72 GB per day.

The storage reduction: 34.5 GB vs 6.72 GB per day. That is 5.1x less storage. The document count reduction: 172.8M vs 240K per day. That is 720x fewer documents. The index on {sensorId: 1, bucketStart: 1} has 720x fewer entries than the per-event index on {sensorId: 1, ts: 1}.

Write Mechanics

Inserting into a bucket uses updateOne with upsert: true:

// FAST: Bucket insert with upsert
Instant timestamp = reading.getTimestamp();
Instant bucketStart = timestamp.truncatedTo(ChronoUnit.HOURS);

collection.updateOne(
    Filters.and(
        Filters.eq("sensorId", reading.getSensorId()),
        Filters.eq("bucketStart", Date.from(bucketStart))
    ),
    Updates.combine(
        Updates.push("measurements", new Document()
            .append("ts", Date.from(timestamp))
            .append("t", reading.getTemperature())
            .append("h", reading.getHumidity())
            .append("p", reading.getPressure())
            .append("v", reading.getVoltage())
        ),
        Updates.inc("count", 1),
        Updates.max("bucketEnd", Date.from(timestamp)),
        Updates.setOnInsert("bucketStart", Date.from(bucketStart)),
        Updates.setOnInsert("sensorId", reading.getSensorId()),
        Updates.min("summary.temperature.min", reading.getTemperature()),
        Updates.max("summary.temperature.max", reading.getTemperature())
    ),
    new UpdateOptions().upsert(true)
);

This is an atomic operation. If the bucket does not exist, upsert: true creates it with the $setOnInsert fields. If it exists, $push appends the measurement and the $min/$max operators maintain the running summary.