Connection Churn: The Cost of Short-Lived Connections
Connection Churn: The Cost of Short-Lived Connections
The Symptom
The telemetry service’s p95 latency spikes to 800ms every 5 minutes. The spike lasts 3-5 seconds and then resolves. CPU and memory on both the application and database are normal during the spike. The MongoDB driver metrics show pool.created counter incrementing by 20-40 connections during each spike, followed by pool.closed incrementing by the same amount 5 minutes later.
The Cause
The default maxConnectionIdleTime is set to 0 (no limit), and maxConnectionLifeTime is set to 0 (no limit). But the team added maxConnectionIdleTime of 60 seconds “to clean up idle connections and save resources.” During low-traffic periods (every 5 minutes at night), some connections go idle for 60 seconds and are closed. When the next burst of traffic arrives, the pool creates 20-40 new connections simultaneously.
Each new MongoDB connection requires:
- TCP three-way handshake: 0.5-2ms (same datacenter), 30-100ms (cross-region)
- TLS handshake (if enabled): 10-50ms
- MongoDB authentication (SCRAM-SHA-256): 2 round trips, 5-20ms
- Server selection and topology check: 1-5ms
Total connection creation cost: 7-175ms per connection. Creating 30 connections simultaneously adds 200-5,000ms of latency for the operations that trigger the creation.
// SLOW: Aggressive idle timeout causes connection churn
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(connString)
.applyToConnectionPoolSettings(builder -> builder
.maxSize(100)
.minSize(0) // No minimum connections maintained
.maxConnectionIdleTime(60, TimeUnit.SECONDS) // Closes idle connections too aggressively
)
.build();
The Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 10)
@Measurement(iterations = 3, time = 30)
@Fork(1)
@State(Scope.Benchmark)
public class ConnectionChurnBenchmark {
@Param({"0", "10", "30"})
private int minPoolSize;
private MongoClient client;
private MongoCollection<Document> collection;
@Setup(Level.Iteration)
public void setup() {
client = MongoClients.create(MongoClientSettings.builder()
.applyConnectionString(new ConnectionString("mongodb://localhost:27017"))
.applyToConnectionPoolSettings(builder -> builder
.maxSize(100)
.minSize(minPoolSize)
.maxConnectionIdleTime(30, TimeUnit.SECONDS)
)
.build());
collection = client.getDatabase("telemetry").getCollection("readings");
// Wait for pool to be idle
try { Thread.sleep(35000); } catch (InterruptedException e) {}
}
@TearDown(Level.Iteration)
public void teardown() {
client.close();
}
@Benchmark
public List<Document> burstAfterIdle() {
// Simulate burst of 50 concurrent operations after idle period
return collection.find(Filters.eq("sensorId", "sensor-00001"))
.limit(10)
.into(new ArrayList<>());
}
}
Results:
Benchmark (minPoolSize) Mode Cnt Score Error Units
ConnectionChurnBenchmark.burstAfterIdle 0 avgt 3 145.000 ± 42.000 ms/op
ConnectionChurnBenchmark.burstAfterIdle 10 avgt 3 12.000 ± 3.000 ms/op
ConnectionChurnBenchmark.burstAfterIdle 30 avgt 3 4.200 ± 0.800 ms/op
With minSize=0, all connections were closed during the idle period. The first operation after idle pays the full connection creation cost: 145ms. With minSize=30, warm connections are always available: 4.2ms.
The Fix
// FAST: Prevent connection churn with appropriate lifecycle settings
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(connString)
.applyToConnectionPoolSettings(builder -> builder
.maxSize(200)
.minSize(20) // Always keep 20 warm connections
.maxConnectionIdleTime(5, TimeUnit.MINUTES) // Generous idle timeout
.maxConnectionLifeTime(30, TimeUnit.MINUTES) // Rotate connections to prevent stale state
.maxWaitTime(2, TimeUnit.SECONDS) // Fail fast on pool exhaustion
)
.build();
minSize=20 ensures the pool never drops below 20 connections, even during idle periods. maxConnectionIdleTime=5m gives connections a long idle window before pruning. maxConnectionLifeTime=30m rotates connections to prevent issues with long-lived connections (firewall timeouts, load balancer draining).
The Proof
| Metric | Before (minSize=0, idleTime=60s) | After (minSize=20, idleTime=5m) |
|---|---|---|
| Connection creates (per hour) | 240 | 12 |
| Burst latency after idle (p99) | 800ms | 18ms |
| Steady-state pool size | 0-100 (fluctuating) | 20-80 (stable) |
| TCP handshakes per hour | 240 | 12 |
The Trade-off
Maintaining 20 minimum connections costs 20 MB of MongoDB server memory and 20 file descriptors. If you run 10 application instances, that is 200 always-open connections and 200 MB of server memory. For a dedicated MongoDB instance, this is negligible. For a shared MongoDB Atlas tier with a 500-connection limit, 200 reserved connections from minimum pool settings may consume 40% of the connection budget before any actual queries are executed. In shared environments, set minSize to the lowest value that prevents churn during your shortest idle period.