Skip to main content
unbound mongodb at scale

Diagnosing Connection Pool Exhaustion Under Load

4 min read Chapter 11 of 72

Diagnosing Connection Pool Exhaustion Under Load

The Symptom

The telemetry ingestion service starts returning 500 errors when load exceeds 800 requests per second. The MongoDB server is not under stress: CPU at 40%, WiredTiger cache usage at 60%. The application log shows:

com.mongodb.MongoWaitQueueFullException: Timeout waiting for a pooled item
after 5000 MILLISECONDS

The database is healthy. The application is starving for connections.

The Cause

The default maxPoolSize is 100. At 800 req/sec with an average operation duration of 15ms, the pool needs at minimum 12 connections. But during a WiredTiger checkpoint, operation latency spikes to 200ms for 2-3 seconds. During that window:

$$\text{requiredConnections} = 800 \times 0.200 = 160$$

The pool has 100 connections. 60 requests per second enter the wait queue. With the default maxWaitTime of 5 seconds, those requests accumulate rapidly.

The Benchmark

// k6 test: connection pool exhaustion detection
import http from 'k6/http';
import { Trend, Rate } from 'k6/metrics';

const latency = new Trend('req_latency', true);
const errors = new Rate('error_rate');

export const options = {
  scenarios: {
    ramp_up: {
      executor: 'ramping-arrival-rate',
      startRate: 100,
      timeUnit: '1s',
      preAllocatedVUs: 200,
      maxVUs: 1000,
      stages: [
        { duration: '1m', target: 100 },
        { duration: '1m', target: 300 },
        { duration: '1m', target: 500 },
        { duration: '1m', target: 800 },
        { duration: '1m', target: 1000 },
      ],
    },
  },
};

export default function() {
  const sensorId = `sensor-${String(Math.floor(Math.random() * 10000)).padStart(5, '0')}`;
  const res = http.post(`${__ENV.BASE_URL}/api/telemetry/ingest`, JSON.stringify({
    sensorId: sensorId,
    timestamp: new Date().toISOString(),
    temperature: 20 + Math.random() * 15,
    humidity: 40 + Math.random() * 30,
  }), { headers: { 'Content-Type': 'application/json' } });

  latency.add(res.timings.duration);
  errors.add(res.status !== 201);
}

Results with maxPoolSize=100:

Load (req/sec)p50p95p99Error rate
1008ms15ms42ms0%
3009ms18ms55ms0%
50012ms45ms280ms0.1%
80015ms320ms2,800ms4.2%
100018ms1,200ms5,000ms12.8%

The inflection point is 500 req/sec. Below that, the pool handles the load. Above it, wait queue times dominate the latency.

The Fix

// FAST: Pool sized for peak throughput
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString("mongodb://mongo-primary:27017"))
    .applyToConnectionPoolSettings(builder -> builder
        .maxSize(200)
        .minSize(30)
        .maxWaitTime(2, TimeUnit.SECONDS)
        .maxConnectionIdleTime(5, TimeUnit.MINUTES)
    )
    .applyToConnectionPoolSettings(builder ->
        builder.addConnectionPoolListener(new ConnectionPoolListener() {
            @Override
            public void connectionCheckedOut(ConnectionCheckedOutEvent event) {
                Metrics.counter("mongodb.pool.checkout").increment();
            }
            @Override
            public void connectionCheckOutFailed(ConnectionCheckOutFailedEvent event) {
                Metrics.counter("mongodb.pool.checkout.failed",
                    "reason", event.getReason().name()).increment();
            }
        })
    )
    .build();

The Proof

Results with maxPoolSize=200:

Load (req/sec)p50p95p99Error rate
1008ms14ms38ms0%
3008ms16ms45ms0%
5009ms18ms52ms0%
80011ms25ms85ms0%
100014ms42ms180ms0.02%

p99 at 1,000 req/sec dropped from 5,000ms to 180ms. Error rate dropped from 12.8% to 0.02%.

The Trade-off

200 connections means 200 TCP sockets to the MongoDB server, each consuming approximately 1 MB of memory on the server side (for the connection’s thread stack, input buffer, and authentication state). At 200 connections, that is 200 MB of MongoDB server memory dedicated to connection management. On a server with 32 GB RAM and 24 GB allocated to WiredTiger cache, 200 MB is acceptable. On a smaller instance, it is not. If multiple application instances each open 200 connections, the server connection count adds up quickly. MongoDB’s default maxIncomingConnections is 65,536, but practical limits are lower due to memory and file descriptor constraints.