The Bucket Pattern: Time-Series Data at Scale
The Bucket Pattern
Storing one document per sensor reading is the natural design. It is also the wrong design for the telemetry platform at scale.
At 10,000 sensors reporting every 5 seconds, the system ingests 2,000 readings per second. Per day: 172.8 million documents. Per month: 5.18 billion documents. Each document carries BSON overhead: the _id field (12 bytes), field names repeated in every document (approximately 80 bytes), and structural bytes (type markers, null terminators). For a reading with 4 numeric values, the payload is 32 bytes but the document is 200 bytes. Overhead ratio: 5.25x.
The bucket pattern groups multiple readings into a single document, amortizing the per-document overhead across many measurements.
Per-Event Model
// Per-event: one document per reading
{
_id: ObjectId("..."),
sensorId: "sensor-00042",
ts: ISODate("2024-01-15T10:30:05Z"),
temperature: 23.5,
humidity: 65.2,
pressure: 1013.25,
voltage: 3.28
}
200 bytes per document. 172.8M documents per day. 34.5 GB per day.
Bucket Model
// Bucketed: one document per sensor per hour
{
_id: ObjectId("..."),
sensorId: "sensor-00042",
bucketStart: ISODate("2024-01-15T10:00:00Z"),
bucketEnd: ISODate("2024-01-15T10:59:55Z"),
count: 720,
measurements: [
{ ts: ISODate("2024-01-15T10:00:00Z"), t: 23.1, h: 64.8, p: 1013.10, v: 3.30 },
{ ts: ISODate("2024-01-15T10:00:05Z"), t: 23.2, h: 64.9, p: 1013.12, v: 3.29 },
// ... 718 more entries
],
summary: {
temperature: { min: 22.8, max: 24.1, avg: 23.45 },
humidity: { min: 63.2, max: 67.1, avg: 65.15 },
pressure: { min: 1012.80, max: 1013.60, avg: 1013.20 },
voltage: { min: 3.25, max: 3.32, avg: 3.28 }
}
}
28 KB per bucket document. 240,000 documents per day (10,000 sensors x 24 hours). 6.72 GB per day.
The storage reduction: 34.5 GB vs 6.72 GB per day. That is 5.1x less storage. The document count reduction: 172.8M vs 240K per day. That is 720x fewer documents. The index on {sensorId: 1, bucketStart: 1} has 720x fewer entries than the per-event index on {sensorId: 1, ts: 1}.
Write Mechanics
Inserting into a bucket uses updateOne with upsert: true:
// FAST: Bucket insert with upsert
Instant timestamp = reading.getTimestamp();
Instant bucketStart = timestamp.truncatedTo(ChronoUnit.HOURS);
collection.updateOne(
Filters.and(
Filters.eq("sensorId", reading.getSensorId()),
Filters.eq("bucketStart", Date.from(bucketStart))
),
Updates.combine(
Updates.push("measurements", new Document()
.append("ts", Date.from(timestamp))
.append("t", reading.getTemperature())
.append("h", reading.getHumidity())
.append("p", reading.getPressure())
.append("v", reading.getVoltage())
),
Updates.inc("count", 1),
Updates.max("bucketEnd", Date.from(timestamp)),
Updates.setOnInsert("bucketStart", Date.from(bucketStart)),
Updates.setOnInsert("sensorId", reading.getSensorId()),
Updates.min("summary.temperature.min", reading.getTemperature()),
Updates.max("summary.temperature.max", reading.getTemperature())
),
new UpdateOptions().upsert(true)
);
This is an atomic operation. If the bucket does not exist, upsert: true creates it with the $setOnInsert fields. If it exists, $push appends the measurement and the $min/$max operators maintain the running summary.