The Collection Scan Trap: Diagnosing COLLSCAN in Production
The Collection Scan Trap
The Symptom
The telemetry platform’s aggregation query for hourly sensor statistics starts timing out after the data set grows past 50 million documents. The query worked at 5 million documents. Nothing in the code changed. The db.currentOp() output shows a single query consuming 98% of the WiredTiger read tickets, blocking all other reads on the replica set member.
// MongoDB slow query log entry
{
"msg": "Slow query",
"attr": {
"type": "command",
"ns": "telemetry.readings",
"command": {
"find": "readings",
"filter": {
"sensorId": "sensor-00042",
"ts": {"$gte": ISODate("2026-05-29T13:00:00Z")}
},
"sort": {"ts": -1},
"limit": 50
},
"planSummary": "COLLSCAN",
"keysExamined": 0,
"docsExamined": 52847391,
"nreturned": 50,
"millis": 14230
}
}
52 million documents examined. 50 returned. 14 seconds. The planSummary says COLLSCAN. No index was used.
The Cause
The collection has no index on {sensorId, ts}. MongoDB has no way to locate documents for sensor-00042 without reading every document in the collection. At 52 million documents, each approximately 400 bytes of BSON, that is roughly 20 GB of data that must be read from the WiredTiger cache or, worse, from disk.
The query also includes a sort on ts descending. Without an index that supports this sort, MongoDB must load all matching documents into memory and sort them. If more than 100MB of data matches the filter before the sort, the query fails entirely.
// SLOW: Query without supporting index
MongoCollection<Document> readings = database.getCollection("readings");
FindIterable<Document> results = readings.find(
Filters.and(
Filters.eq("sensorId", sensorId),
Filters.gte("ts", oneHourAgo)
))
.sort(Sorts.descending("ts"))
.limit(50);
// This triggers a COLLSCAN on 52M documents
for (Document doc : results) {
processReading(doc);
}
The full explain("executionStats") output:
{
"executionStats": {
"executionSuccess": true,
"nReturned": 50,
"executionTimeMillis": 14230,
"totalKeysExamined": 0,
"totalDocsExamined": 52847391,
"executionStages": {
"stage": "SORT",
"nReturned": 50,
"sortPattern": {"ts": -1},
"memLimit": 104857600,
"memUsage": 87432160,
"inputStage": {
"stage": "COLLSCAN",
"nReturned": 86400,
"totalDocsExamined": 52847391,
"direction": "forward"
}
}
}
}
Two problems are visible in this plan. First, COLLSCAN examined 52.8 million documents to find 86,400 matches for this sensor in the last hour. Second, the SORT stage loaded all 86,400 matching documents (87 MB) into memory to sort them by timestamp. It is using 87 MB of the 100 MB limit. With slightly more data, this query will fail.
The Benchmark
The impact of a single COLLSCAN extends beyond the slow query itself. While MongoDB scans 52 million documents, it holds WiredTiger read tickets. Other queries queue behind it.
// k6 test: measure impact of COLLSCAN on concurrent operations
import http from 'k6/http';
import { Trend } from 'k6/metrics';
const normalQueryLatency = new Trend('normal_query_latency', true);
export const options = {
scenarios: {
normal_reads: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 50,
maxVUs: 100,
exec: 'normalRead',
},
heavy_collscan: {
executor: 'constant-arrival-rate',
rate: 2,
timeUnit: '1s',
duration: '2m',
preAllocatedVUs: 5,
maxVUs: 10,
exec: 'heavyQuery',
startTime: '30s', // Start after baseline period
},
},
};
export function normalRead() {
const res = http.get(`${__ENV.BASE_URL}/api/telemetry/latest/sensor-00001`);
normalQueryLatency.add(res.timings.duration);
}
export function heavyQuery() {
http.get(`${__ENV.BASE_URL}/api/telemetry/stats/sensor-00042?window=24h`);
}
Results before and after the COLLSCAN queries begin:
| Metric | First 30s (no COLLSCAN) | After COLLSCAN starts |
|---|---|---|
| Normal read p50 | 4ms | 4ms |
| Normal read p95 | 12ms | 280ms |
| Normal read p99 | 45ms | 1,400ms |
| Available read tickets | 126/128 | 12/128 |
The COLLSCAN queries consume read tickets while scanning millions of documents. Normal reads that previously completed in 4ms now wait hundreds of milliseconds for a read ticket to become available.
The Fix
Create the index that the query needs:
// FAST: Create compound index for the query pattern
collection.createIndex(
Indexes.compoundIndex(
Indexes.ascending("sensorId"),
Indexes.descending("ts")
),
new IndexOptions().name("sensorId_ts_desc")
);
The query now uses IXSCAN:
{
"executionStats": {
"nReturned": 50,
"executionTimeMillis": 3,
"totalKeysExamined": 50,
"totalDocsExamined": 50,
"executionStages": {
"stage": "LIMIT",
"nReturned": 50,
"inputStage": {
"stage": "FETCH",
"nReturned": 50,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 50,
"keyPattern": {"sensorId": 1, "ts": -1},
"indexName": "sensorId_ts_desc",
"totalKeysExamined": 50
}
}
}
}
}
Ratio: 1:1. Time: 3ms. No in-memory SORT stage because the index supports both the filter and the sort order. The SORT stage disappeared entirely.
The Proof
| Metric | Before (COLLSCAN) | After (IXSCAN) |
|---|---|---|
| Query time | 14,230ms | 3ms |
| Docs examined | 52,847,391 | 50 |
| Ratio | 1,056,948:1 | 1:1 |
| Memory sort | 87 MB | 0 MB |
| Impact on concurrent queries | p99 degraded to 1,400ms | No measurable impact |
The Trade-off
The {sensorId: 1, ts: -1} index consumes approximately 1.8 GB of storage for 52 million documents. It must be maintained on every insert: each new telemetry reading adds one index entry. At 1,000 inserts per second, the index maintenance adds approximately 0.5ms of overhead per insert. This is a 4,743,266x improvement in read time for a 0.5ms increase in write time. The trade-off is not close.