The Collection Scan Trap

The Symptom

The telemetry platform’s aggregation query for hourly sensor statistics starts timing out after the data set grows past 50 million documents. The query worked at 5 million documents. Nothing in the code changed. The db.currentOp() output shows a single query consuming 98% of the WiredTiger read tickets, blocking all other reads on the replica set member.

// MongoDB slow query log entry
{
  "msg": "Slow query",
  "attr": {
    "type": "command",
    "ns": "telemetry.readings",
    "command": {
      "find": "readings",
      "filter": {
        "sensorId": "sensor-00042",
        "ts": {"$gte": ISODate("2026-05-29T13:00:00Z")}
      },
      "sort": {"ts": -1},
      "limit": 50
    },
    "planSummary": "COLLSCAN",
    "keysExamined": 0,
    "docsExamined": 52847391,
    "nreturned": 50,
    "millis": 14230
  }
}

52 million documents examined. 50 returned. 14 seconds. The planSummary says COLLSCAN. No index was used.

The Cause

The collection has no index on {sensorId, ts}. MongoDB has no way to locate documents for sensor-00042 without reading every document in the collection. At 52 million documents, each approximately 400 bytes of BSON, that is roughly 20 GB of data that must be read from the WiredTiger cache or, worse, from disk.

The query also includes a sort on ts descending. Without an index that supports this sort, MongoDB must load all matching documents into memory and sort them. If more than 100MB of data matches the filter before the sort, the query fails entirely.

// SLOW: Query without supporting index
MongoCollection<Document> readings = database.getCollection("readings");

FindIterable<Document> results = readings.find(
    Filters.and(
        Filters.eq("sensorId", sensorId),
        Filters.gte("ts", oneHourAgo)
    ))
    .sort(Sorts.descending("ts"))
    .limit(50);

// This triggers a COLLSCAN on 52M documents
for (Document doc : results) {
    processReading(doc);
}

The full explain("executionStats") output:

{
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 50,
    "executionTimeMillis": 14230,
    "totalKeysExamined": 0,
    "totalDocsExamined": 52847391,
    "executionStages": {
      "stage": "SORT",
      "nReturned": 50,
      "sortPattern": {"ts": -1},
      "memLimit": 104857600,
      "memUsage": 87432160,
      "inputStage": {
        "stage": "COLLSCAN",
        "nReturned": 86400,
        "totalDocsExamined": 52847391,
        "direction": "forward"
      }
    }
  }
}

Two problems are visible in this plan. First, COLLSCAN examined 52.8 million documents to find 86,400 matches for this sensor in the last hour. Second, the SORT stage loaded all 86,400 matching documents (87 MB) into memory to sort them by timestamp. It is using 87 MB of the 100 MB limit. With slightly more data, this query will fail.

The Benchmark

The impact of a single COLLSCAN extends beyond the slow query itself. While MongoDB scans 52 million documents, it holds WiredTiger read tickets. Other queries queue behind it.

// k6 test: measure impact of COLLSCAN on concurrent operations
import http from 'k6/http';
import { Trend } from 'k6/metrics';

const normalQueryLatency = new Trend('normal_query_latency', true);

export const options = {
  scenarios: {
    normal_reads: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s',
      duration: '2m',
      preAllocatedVUs: 50,
      maxVUs: 100,
      exec: 'normalRead',
    },
    heavy_collscan: {
      executor: 'constant-arrival-rate',
      rate: 2,
      timeUnit: '1s',
      duration: '2m',
      preAllocatedVUs: 5,
      maxVUs: 10,
      exec: 'heavyQuery',
      startTime: '30s',  // Start after baseline period
    },
  },
};

export function normalRead() {
  const res = http.get(`${__ENV.BASE_URL}/api/telemetry/latest/sensor-00001`);
  normalQueryLatency.add(res.timings.duration);
}

export function heavyQuery() {
  http.get(`${__ENV.BASE_URL}/api/telemetry/stats/sensor-00042?window=24h`);
}

Results before and after the COLLSCAN queries begin:

Metric	First 30s (no COLLSCAN)	After COLLSCAN starts
Normal read p50	4ms	4ms
Normal read p95	12ms	280ms
Normal read p99	45ms	1,400ms
Available read tickets	126/128	12/128

The COLLSCAN queries consume read tickets while scanning millions of documents. Normal reads that previously completed in 4ms now wait hundreds of milliseconds for a read ticket to become available.

The Fix

Create the index that the query needs:

// FAST: Create compound index for the query pattern
collection.createIndex(
    Indexes.compoundIndex(
        Indexes.ascending("sensorId"),
        Indexes.descending("ts")
    ),
    new IndexOptions().name("sensorId_ts_desc")
);

The query now uses IXSCAN:

{
  "executionStats": {
    "nReturned": 50,
    "executionTimeMillis": 3,
    "totalKeysExamined": 50,
    "totalDocsExamined": 50,
    "executionStages": {
      "stage": "LIMIT",
      "nReturned": 50,
      "inputStage": {
        "stage": "FETCH",
        "nReturned": 50,
        "inputStage": {
          "stage": "IXSCAN",
          "nReturned": 50,
          "keyPattern": {"sensorId": 1, "ts": -1},
          "indexName": "sensorId_ts_desc",
          "totalKeysExamined": 50
        }
      }
    }
  }
}

Ratio: 1:1. Time: 3ms. No in-memory SORT stage because the index supports both the filter and the sort order. The SORT stage disappeared entirely.

The Proof

Metric	Before (COLLSCAN)	After (IXSCAN)
Query time	14,230ms	3ms
Docs examined	52,847,391	50
Ratio	1,056,948:1	1:1
Memory sort	87 MB	0 MB
Impact on concurrent queries	p99 degraded to 1,400ms	No measurable impact

The Trade-off

The {sensorId: 1, ts: -1} index consumes approximately 1.8 GB of storage for 52 million documents. It must be maintained on every insert: each new telemetry reading adds one index entry. At 1,000 inserts per second, the index maintenance adds approximately 0.5ms of overhead per insert. This is a 4,743,266x improvement in read time for a 0.5ms increase in write time. The trade-off is not close.