Service-to-Service Latency: DNS, TCP Handshakes, and Connection Reuse at Scale

The content platform’s article service calls four downstream services to render a single article page: search (for related articles), recommendations (for personalized suggestions), analytics (to record the view), and the image service (for responsive thumbnails). Each call looks fast in isolation: 12ms to search, 8ms to recommendations, 3ms to analytics, 5ms to images. But when you trace a cold request, the first call to each service takes 45-80ms instead.

The difference is connection establishment overhead. A new TCP connection costs 1 RTT (round-trip time). TLS adds another 1-2 RTTs. DNS resolution can add 1-50ms depending on cache state. For services within the same datacenter (0.5ms RTT), this overhead seems trivial. For cross-region calls (30ms RTT), it dominates request latency.

This chapter measures each component of connection establishment overhead, demonstrates how Java’s DNS caching behavior creates both performance and reliability problems, and shows how connection reuse eliminates the overhead entirely for steady-state traffic.

The Hidden Latency Budget

Service-to-Service Latency Waterfall

Every new HTTP connection requires a sequence of network operations before the first application byte can be sent:

Cold connection timeline (same datacenter, 0.5ms RTT):
  DNS resolution:     0.5-5ms (cached) or 10-50ms (recursive lookup)
  TCP handshake:      1.0ms (1 RTT = SYN + SYN-ACK + ACK)
  TLS 1.3 handshake:  1.0ms (1 RTT with 0-RTT disabled)
  HTTP request:       0.5ms (send request bytes)
  Application time:   12ms (search service processing)
  ─────────────────────────────────────────────────
  Total cold:         15-68ms
  Total warm (reused connection): 12.5ms

Cold connection timeline (cross-region, 30ms RTT):
  DNS resolution:     2-50ms (regional cache hit/miss)
  TCP handshake:      60ms (1 RTT = 30ms each way)
  TLS 1.3 handshake:  60ms (1 additional RTT)
  HTTP request:       30ms (1 RTT for request/response framing)
  Application time:   12ms
  ─────────────────────────────────────────────────
  Total cold:         164-212ms
  Total warm (reused connection): 42ms

The overhead ratio is staggering. Within a datacenter, a cold connection adds 25-450% overhead. Cross-region, it adds 250-1600% overhead on top of actual application processing time. For the content platform serving 50,000 article views per minute, even a small percentage of cold connections produces visible P99 spikes.

DNS Resolution: The First Tax

Java’s DNS resolution behavior differs from most languages. The JVM caches DNS results with a configurable TTL, independent of the operating system’s DNS cache:

// SLOW: Default JVM DNS caching with SecurityManager
// When SecurityManager is installed, positive cache TTL = forever
// When SecurityManager is NOT installed, default = 30 seconds
// Negative cache TTL (failed lookups) = 10 seconds by default

// Check current JVM DNS cache settings:
public class DnsCacheInspector {

    public static void printDnsCacheSettings() {
        String positiveTtl = Security.getProperty("networkaddress.cache.ttl");
        String negativeTtl = Security.getProperty("networkaddress.cache.negative.ttl");

        System.out.printf("Positive TTL: %s%n", positiveTtl);   // null means default
        System.out.printf("Negative TTL: %s%n", negativeTtl);   // null means default (10s)
    }
}

The problem surfaces in two ways. First, if the positive TTL is set to -1 (cache forever), a DNS record change during a deployment or failover never propagates to the JVM. The service continues calling the old IP address until it is restarted. Second, if the TTL is too short (e.g., 0 for no caching), every HTTP request triggers a DNS lookup, adding 1-50ms of latency.

Measuring DNS Resolution Cost

public class DnsLatencyBenchmark {

    // Direct DNS measurement without JVM cache
    public static long measureDnsResolution(String hostname) {
        // Flush JVM DNS cache for accurate measurement
        // This uses reflection; in production, set TTL via security properties
        long start = System.nanoTime();
        try {
            InetAddress[] addresses = InetAddress.getAllByName(hostname);
            long elapsed = System.nanoTime() - start;
            System.out.printf("Resolved %s to %s in %d us%n",
                hostname,
                Arrays.stream(addresses)
                    .map(InetAddress::getHostAddress)
                    .collect(Collectors.joining(", ")),
                elapsed / 1000);
            return elapsed;
        } catch (UnknownHostException e) {
            long elapsed = System.nanoTime() - start;
            System.out.printf("FAILED to resolve %s in %d us%n",
                hostname, elapsed / 1000);
            return elapsed;
        }
    }

    public static void main(String[] args) throws Exception {
        String target = "search-service.internal.platform.local";

        // First resolution: hits system resolver
        long first = measureDnsResolution(target);

        // Second resolution: hits JVM cache (if TTL > 0)
        long second = measureDnsResolution(target);

        // After TTL expiry: hits system resolver again
        Thread.sleep(31_000); // Wait for default 30s TTL
        long afterExpiry = measureDnsResolution(target);

        System.out.printf("%nFirst: %d us, Cached: %d us, After expiry: %d us%n",
            first / 1000, second / 1000, afterExpiry / 1000);
    }
}

// Output on content platform (internal DNS):
// Resolved search-service.internal.platform.local to 10.0.3.42 in 4200 us
// Resolved search-service.internal.platform.local to 10.0.3.42 in 8 us
// Resolved search-service.internal.platform.local to 10.0.3.42 in 3800 us
//
// First resolution: 4.2ms (system resolver, no OS cache hit)
// Cached resolution: 0.008ms (JVM cache)
// After expiry: 3.8ms (system resolver again)

The JVM cache eliminates 99.8% of DNS latency for repeated lookups. The critical configuration decision is the TTL value:

// FAST: Configure DNS TTL for service discovery compatibility
// Set in JVM startup: -Dsun.net.inetaddr.ttl=60
// Or programmatically before any DNS resolution:
Security.setProperty("networkaddress.cache.ttl", "60");
Security.setProperty("networkaddress.cache.negative.ttl", "5");

// TTL selection criteria:
// - Static infrastructure (fixed IPs):       300s (5 min)
// - Kubernetes with service discovery:        30s (match DNS record TTL)
// - AWS ELB/ALB (IP can change on scaling):   60s
// - Active failover scenarios:                10-30s
// - NEVER use 0 (no caching) in production
// - NEVER use -1 (infinite) with dynamic infrastructure

TCP Handshake: The 1.5 RTT Cost

Every new TCP connection requires a three-way handshake. The actual cost is 1 RTT from the client’s perspective (the client can begin sending after receiving SYN-ACK), but the full handshake completes in 1.5 RTTs:

Client                    Server
  |                          |
  |------- SYN ------------>|  t=0
  |                          |
  |<------ SYN-ACK ---------|  t=1 RTT
  |                          |
  |------- ACK ------------>|  t=1 RTT (client can send data with ACK)
  |                          |
  |------- HTTP Request --->|  t=1 RTT (piggybacked on ACK with TCP Fast Open)
  |                          |

Time to first application byte: 1 RTT (without TFO) or 0 RTT (with TFO, cached)

TCP Fast Open: Eliminating the Handshake RTT

TCP Fast Open (TFO) allows the client to send data in the SYN packet on subsequent connections to the same server. The first connection still requires the full handshake, but a cookie is cached for future connections:

// Enable TCP Fast Open on the server (Linux)
// sysctl -w net.ipv4.tcp_fastopen=3  (enable for both client and server)

// Java's HttpClient does NOT expose TFO configuration directly.
// For services using Netty (Spring WebFlux, Micronaut):
public class TcpFastOpenServerConfig {

    public HttpServer configureServer() {
        return HttpServer.create()
            .option(ChannelOption.TCP_FASTOPEN, 256)  // TFO queue length
            .childOption(ChannelOption.TCP_NODELAY, true)
            .port(8080);
    }
}

// Client-side TFO (Netty-based HTTP client):
public class TcpFastOpenClientConfig {

    public HttpClient configureClient() {
        return HttpClient.create()
            .option(ChannelOption.TCP_FASTOPEN_CONNECT, true)
            .option(ChannelOption.TCP_NODELAY, true);
    }
}

In practice, TFO provides limited benefit for persistent connections (which already amortize the handshake cost). It matters most for short-lived connections or connection pool overflow scenarios.

TLS Handshake: Full vs Resumed

TLS 1.3 reduced the handshake to 1 RTT (from 2 RTTs in TLS 1.2). Session resumption with pre-shared keys enables 0-RTT reconnection:

TLS 1.3 Full Handshake (first connection):
  Client                          Server
    |--- ClientHello ------------->|  t=0
    |<-- ServerHello + Finished --|  t=1 RTT
    |--- Finished + App Data ---->|  t=1 RTT (client sends immediately)
    |
    Total: 1 RTT added on top of TCP handshake

TLS 1.3 Resumed (0-RTT, subsequent connection):
  Client                          Server
    |--- ClientHello + App Data ->|  t=0 (send data immediately!)
    |<-- ServerHello + App Data --|  t=1 RTT (server responds)
    |
    Total: 0 RTT added (data sent with first packet)

Combined first-connection overhead:
  TCP handshake:     1 RTT
  TLS 1.3 full:     1 RTT
  Total:            2 RTTs before first application byte

Combined resumed-connection overhead:
  TCP handshake:     1 RTT (or 0 with TFO)
  TLS 1.3 resumed:  0 RTT
  Total:            1 RTT (or 0 with TFO + 0-RTT TLS)

Configuring TLS Session Resumption in Java

// FAST: Enable TLS session cache for connection reuse
public class TlsSessionConfig {

    public static SSLContext createOptimizedContext() throws Exception {
        SSLContext context = SSLContext.getInstance("TLSv1.3");
        context.init(null, null, null);

        SSLSessionContext sessionContext = context.getClientSessionContext();
        // Cache up to 1024 sessions (default is usually 20480, but check)
        sessionContext.setSessionCacheSize(1024);
        // Sessions valid for 1 hour (matches typical service deployment cycle)
        sessionContext.setSessionTimeout(3600);

        return context;
    }

    // For Java HttpClient (Java 11+):
    public static HttpClient createOptimizedHttpClient() throws Exception {
        return HttpClient.newBuilder()
            .sslContext(createOptimizedContext())
            .version(HttpClient.Version.HTTP_2)
            .connectTimeout(Duration.ofSeconds(5))
            .build();
    }
}

The Content Platform’s Connection Graph

The article service handles 50,000 requests per minute. Each article view triggers calls to four downstream services:

// Content platform article rendering flow:
//
// ArticleService (receives request)
//   ├── SearchService.getRelated(articleId)        - 12ms avg
//   ├── RecommendationService.getPersonalized(userId) - 8ms avg
//   ├── AnalyticsService.recordView(articleId, userId) - 3ms avg (fire-and-forget)
//   └── ImageService.getResizedUrls(imageIds)      - 5ms avg
//
// With cold connections (worst case, all 4 connections new):
//   DNS: 4 * 4ms =        16ms
//   TCP: 4 * 1ms =         4ms
//   TLS: 4 * 1ms =         4ms
//   App: 12+8+3+5 =       28ms (parallel would be max(12,8,3,5)=12ms)
//   ─────────────────────────────
//   Total sequential cold: 52ms
//   Total parallel cold:   36ms
//
// With warm connections (reused):
//   App: max(12, 8, 3, 5) = 12ms (parallel)
//   Total parallel warm:    12ms
//
// Cold/warm ratio: 3x latency difference

public class ArticleRenderingService {

    private final HttpClient httpClient;
    private final ExecutorService executor;

    public ArticleRenderingService() {
        // Shared HttpClient reuses connections across calls
        this.httpClient = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_2)
            .connectTimeout(Duration.ofSeconds(2))
            .build();

        this.executor = Executors.newVirtualThreadPerTaskExecutor();
    }

    public ArticlePage renderArticle(String articleId, String userId) {
        // FAST: Parallel calls over persistent connections
        CompletableFuture<List<Article>> related = CompletableFuture.supplyAsync(
            () -> searchClient.getRelated(articleId), executor);

        CompletableFuture<List<Article>> recommended = CompletableFuture.supplyAsync(
            () -> recommendationClient.getPersonalized(userId), executor);

        CompletableFuture<Void> viewRecorded = CompletableFuture.runAsync(
            () -> analyticsClient.recordView(articleId, userId), executor);

        CompletableFuture<Map<String, String>> images = CompletableFuture.supplyAsync(
            () -> imageClient.getResizedUrls(articleId), executor);

        // Wait for all except analytics (fire-and-forget)
        return new ArticlePage(
            related.join(),
            recommended.join(),
            images.join()
        );
    }
}

Connection Reuse: Eliminating Per-Request Overhead

The key insight: connection establishment cost is paid once per connection, not once per request. With HTTP/1.1 keep-alive, a single connection handles sequential requests. With HTTP/2, a single connection multiplexes hundreds of concurrent requests:

// Connection reuse comparison for the content platform:
//
// HTTP/1.1 without keep-alive:
//   50,000 requests/min to search service
//   = 50,000 TCP+TLS handshakes/min
//   = 50,000 * (1ms + 1ms) = 100 seconds of handshake time per minute
//   CPU overhead: ~5% of search service capacity
//
// HTTP/1.1 with keep-alive (pool size = 50):
//   50 TCP+TLS handshakes at startup
//   = 50 * 2ms = 100ms total (one-time cost)
//   Requests/connection: 50,000 / 50 = 1,000/connection/min
//   Per-request overhead: 0ms (connection already established)
//
// HTTP/2 multiplexed (pool size = 5):
//   5 TCP+TLS handshakes at startup
//   = 5 * 2ms = 10ms total (one-time cost)
//   Concurrent streams per connection: up to 100
//   All 50,000 requests share 5 connections
//   Per-request overhead: 0ms

Java HttpClient Connection Management

// FAST: Java's HttpClient (11+) automatically manages connection pooling
// and HTTP/2 multiplexing when the server supports it.

public class OptimizedServiceClient {

    // Single HttpClient instance shared across the application
    // This is thread-safe and manages its own connection pool
    private static final HttpClient CLIENT = HttpClient.newBuilder()
        .version(HttpClient.Version.HTTP_2)    // Prefer HTTP/2
        .connectTimeout(Duration.ofSeconds(2))
        .followRedirects(HttpClient.Redirect.NEVER) // Services should not redirect
        .build();

    // SLOW: Creating a new HttpClient per request (loses connection reuse)
    public String fetchSlow(String url) throws Exception {
        HttpClient throwawayClient = HttpClient.newHttpClient(); // BAD
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .build();
        return throwawayClient.send(request, BodyHandlers.ofString()).body();
    }

    // FAST: Reusing the shared client (connections persist)
    public String fetchFast(String url) throws Exception {
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .timeout(Duration.ofSeconds(5))
            .build();
        return CLIENT.send(request, BodyHandlers.ofString()).body();
    }
}

Measuring the Improvement

// Benchmark: Cold vs warm connection latency
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 2)
@Measurement(iterations = 5, time = 5)
@Fork(2)
public class ConnectionReuseBenchmark {

    private HttpClient sharedClient;
    private String targetUrl;

    @Setup
    public void setup() {
        sharedClient = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_2)
            .connectTimeout(Duration.ofSeconds(2))
            .build();
        targetUrl = "https://search-service.internal:8443/api/health";
    }

    @Benchmark
    public String coldConnection() throws Exception {
        // SLOW: New client every time, no connection reuse
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(targetUrl))
            .build();
        return client.send(request, BodyHandlers.ofString()).body();
    }

    @Benchmark
    public String warmConnection() throws Exception {
        // FAST: Shared client, connection already established
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(targetUrl))
            .build();
        return sharedClient.send(request, BodyHandlers.ofString()).body();
    }
}

// Results (same datacenter, 0.5ms RTT):
// Benchmark                               Mode  Cnt    Score   Error  Units
// ConnectionReuseBenchmark.coldConnection  avgt   10  4230.4 ± 312.1  us/op
// ConnectionReuseBenchmark.warmConnection  avgt   10   580.6 ±  42.3  us/op
//
// Cold: 4.2ms (DNS + TCP + TLS + request)
// Warm: 0.58ms (request only over established connection)
// Speedup: 7.3x faster with connection reuse

Java’s `networkaddress.cache.ttl` Deep Dive

The JVM DNS cache operates independently of the OS DNS cache. This creates a layered caching system:

Request → JVM DNS Cache → OS DNS Cache → Local DNS Resolver → Authoritative DNS
                ↓                ↓                ↓
           TTL: 30s         TTL: varies       TTL: record-defined
         (configurable)    (OS-managed)      (typically 60-300s)

// Configure JVM DNS cache for Kubernetes service discovery:
// In $JAVA_HOME/conf/security/java.security:
//   networkaddress.cache.ttl=30
//   networkaddress.cache.negative.ttl=5
//
// Or via system property at startup:
//   -Dsun.net.inetaddr.ttl=30
//   -Dsun.net.inetaddr.negative.ttl=5
//
// Or programmatically (MUST be called before any DNS resolution):

public class DnsCacheConfiguration {

    public static void configure() {
        // Positive cache: 30s aligns with Kubernetes DNS record TTL
        Security.setProperty("networkaddress.cache.ttl", "30");
        // Negative cache: 5s allows quick recovery from transient DNS failures
        Security.setProperty("networkaddress.cache.negative.ttl", "5");
    }

    // Danger: If this is called AFTER connections are established,
    // existing cached entries are NOT invalidated. They expire naturally.
    // Always call this in main() before creating any HTTP clients.
}

The negative cache TTL deserves attention. When a DNS lookup fails (NXDOMAIN, timeout), Java caches that failure for the configured negative TTL. If a service is briefly unavailable during deployment, a 10-second negative cache means the JVM will not attempt to resolve it again for 10 seconds, even if the service comes back in 2 seconds.

TCP_NODELAY: Eliminating Nagle’s Delay

Nagle’s algorithm batches small packets to reduce header overhead. For interactive service-to-service calls, this adds up to 40ms of artificial delay:

// SLOW: Nagle's algorithm enabled (default on some socket implementations)
// Small write (e.g., HTTP request headers) waits for ACK before sending body
// Latency: up to 40ms added per small write (TCP delayed ACK timer)

// FAST: Disable Nagle's algorithm for service-to-service communication
public class TcpNoDelayConfig {

    // For raw sockets:
    public Socket createOptimizedSocket(String host, int port) throws Exception {
        Socket socket = new Socket();
        socket.setTcpNoDelay(true);          // Disable Nagle's algorithm
        socket.setKeepAlive(true);           // Enable TCP keep-alive probes
        socket.setSoTimeout(5000);           // Read timeout: 5s
        socket.connect(new InetSocketAddress(host, port), 2000);
        return socket;
    }

    // For Netty-based HTTP clients (Spring WebFlux, Micronaut):
    public HttpClient createNettyClient() {
        return HttpClient.create()
            .option(ChannelOption.TCP_NODELAY, true)
            .option(ChannelOption.SO_KEEPALIVE, true)
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 2000);
    }
}

End-to-End: Before and After

// Content platform article service: before and after optimization
//
// BEFORE (default Java HttpClient settings, short-lived connections):
//   P50 article render: 45ms
//   P99 article render: 180ms (cold connection spikes)
//   DNS lookups/min:    50,000 (one per request)
//   TCP handshakes/min: 50,000
//   TLS handshakes/min: 50,000
//
// AFTER (optimized DNS TTL, persistent HTTP/2, connection pre-warming):
//   P50 article render: 14ms
//   P99 article render: 38ms (no cold connection spikes in steady state)
//   DNS lookups/min:    4 (one per service per 30s TTL)
//   TCP handshakes/min: 4 (connection pool refresh)
//   TLS handshakes/min: 4
//
// Improvement:
//   P50: 3.2x faster
//   P99: 4.7x faster (cold connections eliminated)
//   Network overhead: reduced by 99.99%

The next two sections drill into the specifics: Section 1 covers DNS configuration and TCP measurement techniques, Section 2 covers connection pool architecture and warm-up strategies.