HTTP Client Connection Pools and Keep-Alive

The content platform’s article service makes HTTP calls to five upstream services: search indexer, recommendation engine, analytics pipeline, image service, and notification service. Each call establishes a TCP connection, optionally negotiates TLS, sends the request, and receives the response. Without connection pooling, each call pays the full setup cost. With pooling, the connection is reused across requests.

The connection setup cost is not trivial:

Step	Latency
TCP handshake (SYN, SYN-ACK, ACK)	0.5 ms (same datacenter)
TLS 1.3 handshake (1-RTT)	1.0 ms (same datacenter)
TLS 1.2 handshake (2-RTT)	2.0 ms (same datacenter)
Total (TLS 1.3)	1.5 ms
Total (TLS 1.2)	2.5 ms

At 5,000 requests/second to the recommendation engine, creating a new connection per request adds 7.5 seconds of connection overhead per second. That is 7.5 seconds of thread time per second spent waiting for TCP and TLS handshakes, not doing useful work. With connection reuse, this drops to near zero.

The Cost of Not Pooling

The Java HttpClient (introduced in Java 11) manages connections internally, but its default configuration is not always optimal. Without explicit pool management, each request may create a new connection:

// SLOW: New HttpClient per request (no connection reuse)
public String fetchRecommendations(String userId) throws Exception {
    HttpClient client = HttpClient.newHttpClient(); // New pool per client
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("https://recommendations:8443/api/v1/recs/" + userId))
        .GET()
        .build();
    HttpResponse<String> response =
        client.send(request, HttpResponse.BodyHandlers.ofString());
    return response.body();
}

// FAST: Shared HttpClient with connection reuse
private static final HttpClient HTTP_CLIENT = HttpClient.newBuilder()
    .connectTimeout(Duration.ofSeconds(2))
    .executor(Executors.newFixedThreadPool(20))
    .version(HttpClient.Version.HTTP_2)
    .build();

public String fetchRecommendations(String userId) throws Exception {
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("https://recommendations:8443/api/v1/recs/" + userId))
        .GET()
        .timeout(Duration.ofSeconds(5))
        .build();
    HttpResponse<String> response =
        HTTP_CLIENT.send(request, HttpResponse.BodyHandlers.ofString());
    return response.body();
}

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 2)
@Measurement(iterations = 5, time = 2)
@Fork(2)
@State(Scope.Benchmark)
public class HttpClientPoolBenchmark {

    private HttpClient sharedClient;
    private String targetUrl;

    @Setup(Level.Trial)
    public void setup() {
        sharedClient = HttpClient.newBuilder()
            .connectTimeout(Duration.ofSeconds(2))
            .build();
        targetUrl = "https://localhost:8443/api/v1/articles/perf-101";
    }

    @Benchmark
    public String newClientPerRequest() throws Exception {
        // SLOW: New client, new connection, full TLS handshake
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(targetUrl))
            .GET()
            .build();
        return client.send(request, HttpResponse.BodyHandlers.ofString())
            .body();
    }

    @Benchmark
    public String sharedClientReuse() throws Exception {
        // FAST: Reused client, reused connection, no handshake
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(targetUrl))
            .GET()
            .build();
        return sharedClient.send(request, HttpResponse.BodyHandlers.ofString())
            .body();
    }
}

Approach	Avg Latency	P99 Latency	Connections Created
New client per request	3.8 ms	12.4 ms	1 per request
Shared client (reuse)	1.2 ms	3.1 ms	1 per host (reused)

Connection reuse reduces average latency by 68%. The P99 improvement is even larger (75%) because new connection creation has high variance: TLS handshakes occasionally take 10+ ms due to certificate chain validation and OCSP stapling.

Apache HttpClient: Explicit Pool Control

Java’s built-in HttpClient manages its pool internally without exposing configuration knobs. For fine-grained control, Apache HttpClient 5 provides explicit pool management:

// FAST: Apache HttpClient with explicit pool configuration
public class HttpClientFactory {

    public static CloseableHttpClient createPooledClient() {
        PoolingHttpClientConnectionManager connManager =
            PoolingHttpClientConnectionManagerBuilder.create()
                .setMaxConnTotal(100)       // Total connections across all hosts
                .setMaxConnPerRoute(20)     // Max connections per target host
                .setDefaultConnectionConfig(ConnectionConfig.custom()
                    .setConnectTimeout(Timeout.ofSeconds(2))
                    .setSocketTimeout(Timeout.ofSeconds(5))
                    .setValidateAfterInactivity(TimeValue.ofSeconds(10))
                    .build())
                .build();

        return HttpClients.custom()
            .setConnectionManager(connManager)
            .setKeepAliveStrategy((response, context) -> {
                // Default keep-alive: 60 seconds
                // Override if server sends Keep-Alive header
                HeaderIterator it = response.headerIterator("Keep-Alive");
                while (it.hasNext()) {
                    Header header = it.next();
                    if (header.getValue().contains("timeout=")) {
                        String timeout = header.getValue()
                            .replaceAll(".*timeout=(\\d+).*", "$1");
                        return TimeValue.ofSeconds(Long.parseLong(timeout));
                    }
                }
                return TimeValue.ofSeconds(60);
            })
            .evictExpiredConnections()
            .evictIdleConnections(TimeValue.ofSeconds(30))
            .build();
    }
}

Pool Configuration Explained

maxConnTotal(100): The absolute maximum connections across all target hosts. This bounds the total file descriptor and memory usage. For the content platform communicating with 5 upstream services, 100 total connections provide 20 per service.

maxConnPerRoute(20): The maximum connections to a single target host. This is the critical setting. If the recommendation engine handles 5,000 req/s from the article service, and each request takes 10 ms:

$$L = 5{,}000 \times 0.01 = 50 \text{ connections needed}$$

But maxConnPerRoute is 20. The excess 30 requests queue, adding latency. The fix: increase maxConnPerRoute to 50 for the recommendation engine, or reduce response time.

// Per-route configuration for different upstream services
HttpRoute recsRoute = new HttpRoute(
    new HttpHost("recommendations", 8443, "https"));
connManager.setMaxPerRoute(recsRoute, 50);  // High-traffic service

HttpRoute imageRoute = new HttpRoute(
    new HttpHost("images", 8443, "https"));
connManager.setMaxPerRoute(imageRoute, 10);  // Low-traffic service

validateAfterInactivity(10 seconds): Connections idle for more than 10 seconds are validated (TCP ping) before reuse. This prevents using stale connections that the server or an intermediary (load balancer, firewall) has closed. The 10-second threshold balances validation cost against stale connection risk.

evictIdleConnections(30 seconds): A background thread closes connections idle longer than 30 seconds. This frees resources when traffic drops (e.g., overnight) and prevents the pool from holding connections to services that have been redeployed.

Keep-Alive and Connection Reuse

HTTP keep-alive allows multiple requests over a single TCP connection. Without keep-alive, each request-response cycle closes the connection:

Without Keep-Alive:
Request 1: [TCP handshake] [TLS] [Request] [Response] [TCP close]
Request 2: [TCP handshake] [TLS] [Request] [Response] [TCP close]
Request 3: [TCP handshake] [TLS] [Request] [Response] [TCP close]

With Keep-Alive:
Request 1: [TCP handshake] [TLS] [Request] [Response]
Request 2:                       [Request] [Response]
Request 3:                       [Request] [Response]
                                                      [TCP close after idle]

The keep-alive duration must be coordinated between client and server. If the server closes the connection after 30 seconds of idle time but the client tries to reuse it at 35 seconds, the client gets a connection reset error and must retry.

// Server-side keep-alive (Spring Boot / Tomcat)
// application.properties
// server.tomcat.keep-alive-timeout=60000
// server.tomcat.max-keep-alive-requests=1000

// Client-side: keep-alive must be shorter than server
.setKeepAliveStrategy((response, context) -> {
    // Use server's Keep-Alive header if present
    // Otherwise default to 30s (safely shorter than server's 60s)
    return TimeValue.ofSeconds(30);
})

The content platform sets server keep-alive to 60 seconds and client keep-alive to 30 seconds. This ensures the client never tries to reuse a connection the server has already closed. The 30-second buffer absorbs clock skew and network latency.

HTTP/2 Connection Multiplexing

HTTP/2 multiplexes multiple requests over a single TCP connection using streams. This changes the pool sizing math: instead of needing $N$ connections for $N$ concurrent requests, a single HTTP/2 connection can handle hundreds of concurrent streams.

// HTTP/2 with Java HttpClient
private static final HttpClient HTTP2_CLIENT = HttpClient.newBuilder()
    .version(HttpClient.Version.HTTP_2)
    .connectTimeout(Duration.ofSeconds(2))
    .build();

// Multiple concurrent requests over one connection
public List<String> fetchMultipleArticles(List<String> ids)
        throws Exception {
    List<CompletableFuture<HttpResponse<String>>> futures = ids.stream()
        .map(id -> HttpRequest.newBuilder()
            .uri(URI.create("https://articles:8443/api/v1/" + id))
            .GET()
            .build())
        .map(req -> HTTP2_CLIENT.sendAsync(req,
            HttpResponse.BodyHandlers.ofString()))
        .toList();

    return futures.stream()
        .map(CompletableFuture::join)
        .map(HttpResponse::body)
        .toList();
}

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 2)
@Measurement(iterations = 5, time = 2)
@Fork(2)
@State(Scope.Benchmark)
public class Http1VsHttp2Benchmark {

    private HttpClient http1Client;
    private HttpClient http2Client;
    private List<HttpRequest> requests;

    @Setup(Level.Trial)
    public void setup() {
        http1Client = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_1_1)
            .build();
        http2Client = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_2)
            .build();

        requests = IntStream.range(0, 50)
            .mapToObj(i -> HttpRequest.newBuilder()
                .uri(URI.create("https://localhost:8443/api/v1/articles/art-" + i))
                .GET()
                .build())
            .toList();
    }

    @Benchmark
    public List<String> http1Sequential() throws Exception {
        // SLOW: 50 requests, 1 per connection, sequential
        List<String> results = new ArrayList<>(50);
        for (HttpRequest req : requests) {
            results.add(
                http1Client.send(req, HttpResponse.BodyHandlers.ofString())
                    .body());
        }
        return results;
    }

    @Benchmark
    public List<String> http2Multiplexed() throws Exception {
        // FAST: 50 requests multiplexed over 1-2 connections
        List<CompletableFuture<HttpResponse<String>>> futures =
            requests.stream()
                .map(req -> http2Client.sendAsync(req,
                    HttpResponse.BodyHandlers.ofString()))
                .toList();
        return futures.stream()
            .map(CompletableFuture::join)
            .map(HttpResponse::body)
            .toList();
    }
}

Protocol	50 Requests Latency	Connections Used	Head-of-Line Blocking
HTTP/1.1 sequential	125 ms	1	Yes (per-connection)
HTTP/1.1 parallel (pool=20)	15 ms	20	Yes (per-connection)
HTTP/2 multiplexed	8 ms	1-2	Yes (TCP-level)

HTTP/2 multiplexing is 47% faster than HTTP/1.1 with a pool of 20 connections for 50 concurrent requests. The advantage comes from eliminating connection management overhead and allowing all 50 requests to fly concurrently over a single connection.

The trade-off: HTTP/2 introduces TCP-level head-of-line blocking. If a single TCP packet is lost, all streams on that connection stall until the packet is retransmitted. HTTP/1.1 with multiple connections isolates packet loss to the affected connection. In practice, on datacenter networks with <0.01% packet loss, HTTP/2’s head-of-line blocking is negligible. On lossy networks (mobile, cross-region), it can cause latency spikes.

For the content platform’s datacenter communication (all services in the same availability zone), HTTP/2 provides the best performance with the fewest connections. The team uses HTTP/2 for service-to-service calls and HTTP/1.1 as a fallback for services that do not support HTTP/2.

Timeout Configuration

Connection pool timeouts must form a hierarchy. If the read timeout is shorter than the connect timeout, a slow connection establishment is killed before it completes. If the total request timeout is shorter than connect + read, every slow request is killed before the response arrives.

// SLOW: Inconsistent timeouts cause confusion
ConnectionConfig.custom()
    .setConnectTimeout(Timeout.ofSeconds(30))   // Too long
    .setSocketTimeout(Timeout.ofSeconds(5))      // Shorter than connect
    .build();

// FAST: Hierarchical timeouts
ConnectionConfig.custom()
    .setConnectTimeout(Timeout.ofSeconds(2))     // Fast fail on unreachable host
    .setSocketTimeout(Timeout.ofSeconds(5))      // Read timeout per socket operation
    .build();

RequestConfig.custom()
    .setConnectionRequestTimeout(Timeout.ofSeconds(1))  // Wait for pool connection
    .setResponseTimeout(Timeout.ofSeconds(10))           // Total request timeout
    .build();

The timeout hierarchy:

$$\text{pool wait} < \text{connect} < \text{socket read} < \text{total request}$$

For the content platform:

Timeout	Value	Purpose
Connection pool wait	1s	Time waiting for an available connection from pool
Connect timeout	2s	TCP + TLS handshake (datacenter: usually <10ms)
Socket read timeout	5s	Time waiting for response data after request sent
Total request timeout	10s	Hard cap on entire request lifecycle

connectionRequestTimeout(1 second): If no connection is available from the pool within 1 second, the request fails. This is the pool exhaustion signal. When this timeout fires frequently, the pool is undersized or a downstream service is slow (holding connections open).

connectTimeout(2 seconds): If the TCP/TLS handshake does not complete in 2 seconds, the target host is unreachable or overloaded. Waiting longer wastes a thread.

socketTimeout(5 seconds): If no data arrives on the socket for 5 seconds after the request is sent, the server is likely stuck processing. Waiting longer risks cascading timeouts upstream.

responseTimeout(10 seconds): The absolute cap. Even if the connection is established and data is trickling in, the total request must complete within 10 seconds. This prevents slow responses from holding connections indefinitely.

DNS Resolution and Connection Pools

Connection pools cache connections by host+port. But in containerized environments, DNS records change when services are redeployed. The pool holds connections to the old IP address while new instances run at the new IP.

// Problem: Default DNS caching (30 seconds in JVM) means
// pool holds stale connections after service redeploy

// Fix: Reduce JVM DNS cache TTL
java.security.Security.setProperty("networkaddress.cache.ttl", "10");
java.security.Security.setProperty("networkaddress.cache.negative.ttl", "5");

The JVM caches DNS lookups for 30 seconds by default (infinite in some configurations). For the content platform running in Kubernetes, service IPs change on every deployment. Setting the DNS cache TTL to 10 seconds ensures new connections target the current service instances. Existing pooled connections continue using the old IP until they are closed by maxLifetime or idle eviction.

The interaction between DNS TTL and connection pool configuration:

$$\text{stale connection window} = \text{DNS TTL} + \text{pool maxLifetime}$$

With DNS TTL = 10s and maxLifetime = 300s, the worst-case stale window is 310 seconds. To reduce this, lower maxLifetime or add health checks that detect connections to deregistered instances.

For the content platform, the team uses a connection pool eviction strategy that validates connections every 10 seconds (validateAfterInactivity) and evicts idle connections after 30 seconds. Combined with a 10-second DNS TTL, this ensures that within 40 seconds of a service redeployment, all connections target the new instances.