Skip to main content
unbound mongodb at scale

The Bucket Pattern: Time-Series Data at Scale

3 min read Chapter 19 of 72

The Bucket Pattern

Storing one document per sensor reading is the natural design. It is also the wrong design for the telemetry platform at scale.

At 10,000 sensors reporting every 5 seconds, the system ingests 2,000 readings per second. Per day: 172.8 million documents. Per month: 5.18 billion documents. Each document carries BSON overhead: the _id field (12 bytes), field names repeated in every document (approximately 80 bytes), and structural bytes (type markers, null terminators). For a reading with 4 numeric values, the payload is 32 bytes but the document is 200 bytes. Overhead ratio: 5.25x.

The bucket pattern groups multiple readings into a single document, amortizing the per-document overhead across many measurements.

Bucket pattern comparison: per-event model (1 doc = 1 reading, 200 bytes each) vs hourly bucket (1 doc = 720 readings, 28KB, 39 bytes per reading effective). Shows 5.1x storage reduction and 720x fewer documents.

Per-Event Model

// Per-event: one document per reading
{
  _id: ObjectId("..."),
  sensorId: "sensor-00042",
  ts: ISODate("2024-01-15T10:30:05Z"),
  temperature: 23.5,
  humidity: 65.2,
  pressure: 1013.25,
  voltage: 3.28
}

200 bytes per document. 172.8M documents per day. 34.5 GB per day.

Bucket Model

// Bucketed: one document per sensor per hour
{
  _id: ObjectId("..."),
  sensorId: "sensor-00042",
  bucketStart: ISODate("2024-01-15T10:00:00Z"),
  bucketEnd: ISODate("2024-01-15T10:59:55Z"),
  count: 720,
  measurements: [
    { ts: ISODate("2024-01-15T10:00:00Z"), t: 23.1, h: 64.8, p: 1013.10, v: 3.30 },
    { ts: ISODate("2024-01-15T10:00:05Z"), t: 23.2, h: 64.9, p: 1013.12, v: 3.29 },
    // ... 718 more entries
  ],
  summary: {
    temperature: { min: 22.8, max: 24.1, avg: 23.45 },
    humidity: { min: 63.2, max: 67.1, avg: 65.15 },
    pressure: { min: 1012.80, max: 1013.60, avg: 1013.20 },
    voltage: { min: 3.25, max: 3.32, avg: 3.28 }
  }
}

28 KB per bucket document. 240,000 documents per day (10,000 sensors x 24 hours). 6.72 GB per day.

The storage reduction: 34.5 GB vs 6.72 GB per day. That is 5.1x less storage. The document count reduction: 172.8M vs 240K per day. That is 720x fewer documents. The index on {sensorId: 1, bucketStart: 1} has 720x fewer entries than the per-event index on {sensorId: 1, ts: 1}.

Write Mechanics

Inserting into a bucket uses updateOne with upsert: true:

// FAST: Bucket insert with upsert
Instant timestamp = reading.getTimestamp();
Instant bucketStart = timestamp.truncatedTo(ChronoUnit.HOURS);

collection.updateOne(
    Filters.and(
        Filters.eq("sensorId", reading.getSensorId()),
        Filters.eq("bucketStart", Date.from(bucketStart))
    ),
    Updates.combine(
        Updates.push("measurements", new Document()
            .append("ts", Date.from(timestamp))
            .append("t", reading.getTemperature())
            .append("h", reading.getHumidity())
            .append("p", reading.getPressure())
            .append("v", reading.getVoltage())
        ),
        Updates.inc("count", 1),
        Updates.max("bucketEnd", Date.from(timestamp)),
        Updates.setOnInsert("bucketStart", Date.from(bucketStart)),
        Updates.setOnInsert("sensorId", reading.getSensorId()),
        Updates.min("summary.temperature.min", reading.getTemperature()),
        Updates.max("summary.temperature.max", reading.getTemperature())
    ),
    new UpdateOptions().upsert(true)
);

This is an atomic operation. If the bucket does not exist, upsert: true creates it with the $setOnInsert fields. If it exists, $push appends the measurement and the $min/$max operators maintain the running summary.