Skip to main content
unbound mongodb at scale

Connection Churn: The Cost of Short-Lived Connections

4 min read Chapter 12 of 72

Connection Churn: The Cost of Short-Lived Connections

The Symptom

The telemetry service’s p95 latency spikes to 800ms every 5 minutes. The spike lasts 3-5 seconds and then resolves. CPU and memory on both the application and database are normal during the spike. The MongoDB driver metrics show pool.created counter incrementing by 20-40 connections during each spike, followed by pool.closed incrementing by the same amount 5 minutes later.

The Cause

The default maxConnectionIdleTime is set to 0 (no limit), and maxConnectionLifeTime is set to 0 (no limit). But the team added maxConnectionIdleTime of 60 seconds “to clean up idle connections and save resources.” During low-traffic periods (every 5 minutes at night), some connections go idle for 60 seconds and are closed. When the next burst of traffic arrives, the pool creates 20-40 new connections simultaneously.

Each new MongoDB connection requires:

  1. TCP three-way handshake: 0.5-2ms (same datacenter), 30-100ms (cross-region)
  2. TLS handshake (if enabled): 10-50ms
  3. MongoDB authentication (SCRAM-SHA-256): 2 round trips, 5-20ms
  4. Server selection and topology check: 1-5ms

Total connection creation cost: 7-175ms per connection. Creating 30 connections simultaneously adds 200-5,000ms of latency for the operations that trigger the creation.

// SLOW: Aggressive idle timeout causes connection churn
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(connString)
    .applyToConnectionPoolSettings(builder -> builder
        .maxSize(100)
        .minSize(0)                                    // No minimum connections maintained
        .maxConnectionIdleTime(60, TimeUnit.SECONDS)   // Closes idle connections too aggressively
    )
    .build();

The Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 10)
@Measurement(iterations = 3, time = 30)
@Fork(1)
@State(Scope.Benchmark)
public class ConnectionChurnBenchmark {

    @Param({"0", "10", "30"})
    private int minPoolSize;

    private MongoClient client;
    private MongoCollection<Document> collection;

    @Setup(Level.Iteration)
    public void setup() {
        client = MongoClients.create(MongoClientSettings.builder()
            .applyConnectionString(new ConnectionString("mongodb://localhost:27017"))
            .applyToConnectionPoolSettings(builder -> builder
                .maxSize(100)
                .minSize(minPoolSize)
                .maxConnectionIdleTime(30, TimeUnit.SECONDS)
            )
            .build());
        collection = client.getDatabase("telemetry").getCollection("readings");

        // Wait for pool to be idle
        try { Thread.sleep(35000); } catch (InterruptedException e) {}
    }

    @TearDown(Level.Iteration)
    public void teardown() {
        client.close();
    }

    @Benchmark
    public List<Document> burstAfterIdle() {
        // Simulate burst of 50 concurrent operations after idle period
        return collection.find(Filters.eq("sensorId", "sensor-00001"))
            .limit(10)
            .into(new ArrayList<>());
    }
}

Results:

Benchmark                              (minPoolSize)  Mode  Cnt     Score     Error  Units
ConnectionChurnBenchmark.burstAfterIdle            0  avgt    3   145.000 ±  42.000  ms/op
ConnectionChurnBenchmark.burstAfterIdle           10  avgt    3    12.000 ±   3.000  ms/op
ConnectionChurnBenchmark.burstAfterIdle           30  avgt    3     4.200 ±   0.800  ms/op

With minSize=0, all connections were closed during the idle period. The first operation after idle pays the full connection creation cost: 145ms. With minSize=30, warm connections are always available: 4.2ms.

The Fix

// FAST: Prevent connection churn with appropriate lifecycle settings
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(connString)
    .applyToConnectionPoolSettings(builder -> builder
        .maxSize(200)
        .minSize(20)                                     // Always keep 20 warm connections
        .maxConnectionIdleTime(5, TimeUnit.MINUTES)      // Generous idle timeout
        .maxConnectionLifeTime(30, TimeUnit.MINUTES)     // Rotate connections to prevent stale state
        .maxWaitTime(2, TimeUnit.SECONDS)                // Fail fast on pool exhaustion
    )
    .build();

minSize=20 ensures the pool never drops below 20 connections, even during idle periods. maxConnectionIdleTime=5m gives connections a long idle window before pruning. maxConnectionLifeTime=30m rotates connections to prevent issues with long-lived connections (firewall timeouts, load balancer draining).

The Proof

MetricBefore (minSize=0, idleTime=60s)After (minSize=20, idleTime=5m)
Connection creates (per hour)24012
Burst latency after idle (p99)800ms18ms
Steady-state pool size0-100 (fluctuating)20-80 (stable)
TCP handshakes per hour24012

The Trade-off

Maintaining 20 minimum connections costs 20 MB of MongoDB server memory and 20 file descriptors. If you run 10 application instances, that is 200 always-open connections and 200 MB of server memory. For a dedicated MongoDB instance, this is negligible. For a shared MongoDB Atlas tier with a 500-connection limit, 200 reserved connections from minimum pool settings may consume 40% of the connection budget before any actual queries are executed. In shared environments, set minSize to the lowest value that prevents churn during your shortest idle period.