From DNS to the JVM: The First Half of the Request

The Symptom

After a deployment, 2% of ride requests fail with connection timeouts for 90 seconds, then recover. The application logs show no errors. The load balancer logs show connections to terminated pods. The DNS TTL is 300 seconds, and the old pod IPs are cached in the JVM’s DNS resolver.

The Cause

The first three layers of the request path, DNS, load balancer, and TLS, are invisible until they fail. Engineers focus on application code because that is what they control. But a DNS caching misconfiguration or a load balancer queue backup can add seconds to every request without a single line of application code being involved.

DNS Resolution

The JVM caches DNS lookups by default. The cache TTL is controlled by networkaddress.cache.ttl in java.security. On most JDK distributions, successful lookups are cached for 30 seconds. On some, they are cached forever.

For the ride-hailing platform, the driver location service calls the surge pricing service at surge-pricing.internal. When the surge pricing service redeploys, its pods get new IPs. If the driver location service has cached the old IPs, requests fail until the DNS cache expires.

// BOTTLENECK: JVM caches DNS indefinitely on some configurations
// No code needed to demonstrate - this is a JVM property issue

// SCALED: Set DNS cache TTL in application startup
public class DnsCacheConfig {
    static {
        // Cache DNS lookups for 30 seconds
        java.security.Security.setProperty("networkaddress.cache.ttl", "30");
        // Cache negative lookups (failed DNS) for 10 seconds
        java.security.Security.setProperty("networkaddress.cache.negative.ttl", "10");
    }
}

In Kubernetes, service DNS records update within seconds of a pod becoming ready. A 30-second TTL means the worst case is 30 seconds of traffic to a terminated pod. This is acceptable. A 300-second TTL means 5 minutes. That is not.

Load Balancer Queue Depth

The load balancer (whether Kubernetes Ingress, Envoy sidecar, or an external ALB) queues requests when the backend cannot accept new connections fast enough. Under normal load, the queue is empty and the load balancer adds 1-2ms of latency. Under high load, the queue grows.

The symptom: p99 latency includes a 50-200ms component that does not appear in any application metric. The application thinks the request took 100ms. The client thinks it took 300ms. The 200ms gap is the load balancer queue.

# Kubernetes Ingress annotation to expose queue metrics
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ride-hailing-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "5"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "10"
    # Limit concurrent connections to prevent queue explosion
    nginx.ingress.kubernetes.io/limit-connections: "100"

TLS Session Resumption

TLS 1.3 handshake takes one round trip (1-RTT). TLS session resumption with pre-shared keys takes zero additional round trips (0-RTT). When the ride-hailing platform’s pods restart during a deployment, all TLS sessions are invalidated. Every client must perform a full handshake. Under connection storms (thousands of riders opening the app simultaneously after a push notification), full TLS handshakes consume CPU on the load balancer.

The fix: TLS termination at the load balancer or CDN, not at the application. The load balancer maintains a TLS session cache across pod restarts.

The Baseline

The Locust test from Chapter 1, run during a deployment, shows the DNS and TLS impact:

During deployment (rolling update, 2 of 6 pods restarting):
Name                     p95    p99    Fail%
/api/fares/estimate      1200   4800   2.1%    ← Connections to old pods
/api/drivers/nearby       600   2400   1.8%

After deployment complete + 30s DNS TTL expiry:
Name                     p95    p99    Fail%
/api/fares/estimate       420   2100   0.0%    ← Back to baseline
/api/drivers/nearby       280    800   0.0%

The 2.1% failure rate during deployment is DNS cache holding stale pod IPs. The 30-second recovery matches the DNS TTL.

The Fix: Thread Pool Math for WebFlux vs MVC

The most impactful architectural decision in the first half of the request lifecycle is the threading model.

Spring MVC (thread-per-request): Tomcat’s default thread pool is 200 threads. Each thread handles one request at a time. While that thread waits for PostgreSQL (35ms), Redis (2ms), or the surge pricing service (50ms), it is blocked. It cannot handle another request.

Maximum concurrent requests with Spring MVC: 200. Not 200 RPS. 200 concurrent in-flight requests. At an average request duration of 100ms, the theoretical maximum throughput is $200 / 0.1 = 2,000$ RPS. At an average request duration of 500ms (under load), it drops to $200 / 0.5 = 400$ RPS.

// BOTTLENECK: Spring MVC thread-per-request
// Each thread blocks waiting for I/O
@RestController
public class FareControllerBlocking {

    @PostMapping("/api/fares/estimate")
    public FareEstimate estimateFare(@RequestBody FareRequest request) {
        // Thread blocks here for 2ms (Redis) or 800ms (compute)
        FareEstimate cached = redisTemplate.opsForValue()
            .get("fare:" + request.gridCell());
        if (cached != null) return cached;

        // Thread blocks here for 35ms (PostgreSQL)
        BaseFare baseFare = fareRepository.findByRoute(
            request.pickupZone(), request.dropoffZone());

        // Thread blocks here for 50ms (HTTP call to surge service)
        double surgeMultiplier = surgeClient.getMultiplier(request.pickupZone());

        FareEstimate estimate = new FareEstimate(
            baseFare.amount() * surgeMultiplier);
        redisTemplate.opsForValue()
            .set("fare:" + request.gridCell(), estimate, Duration.ofSeconds(60));
        return estimate;
    }
}

Spring WebFlux (event loop): Netty’s event loop group has a number of threads equal to the number of CPU cores (typically 4-16). These threads never block. Instead of waiting for I/O, they register a callback and move on to the next request. When the I/O completes, the callback fires on an available event loop thread.

Maximum concurrent requests with Spring WebFlux: limited by memory, not threads. Each in-flight request consumes roughly 2-8 KB of heap for the reactive chain state. With 1GB of heap dedicated to request handling, the theoretical limit is approximately 125,000 concurrent requests.

// SCALED: Spring WebFlux non-blocking
// Event loop threads never wait for I/O
@RestController
public class FareControllerReactive {

    @PostMapping("/api/fares/estimate")
    public Mono<FareEstimate> estimateFare(@RequestBody FareRequest request) {
        String cacheKey = "fare:" + request.gridCell();

        return reactiveRedisTemplate.opsForValue().get(cacheKey)
            .switchIfEmpty(
                Mono.zip(
                    fareRepository.findByRoute(
                        request.pickupZone(), request.dropoffZone()),
                    surgeClient.getMultiplier(request.pickupZone())
                )
                .map(tuple -> new FareEstimate(
                    tuple.getT1().amount() * tuple.getT2()))
                .flatMap(estimate ->
                    reactiveRedisTemplate.opsForValue()
                        .set(cacheKey, estimate, Duration.ofSeconds(60))
                        .thenReturn(estimate))
            );
    }
}

The reactive version parallelizes the PostgreSQL query and surge service call with Mono.zip. The blocking version runs them sequentially. Under cache miss conditions, the blocking version takes 35ms + 50ms = 85ms for I/O. The reactive version takes max(35ms, 50ms) = 50ms.

The Proof

Locust test comparing Spring MVC (200 Tomcat threads) and Spring WebFlux (4 event loop threads) on the same hardware:

Spring MVC at 500 concurrent users:
Name                     p95    p99    RPS    Fail%
/api/fares/estimate     2800   8400   420    4.2%    ← Thread pool exhausted

Spring WebFlux at 500 concurrent users:
Name                     p95    p99    RPS    Fail%
/api/fares/estimate      340    890  1850    0.0%    ← Event loop handles it

Spring WebFlux handles 4.4x the throughput with 50 fewer threads. The p99 is 9.4x lower. The failure rate is zero.

This is not a universal truth. For CPU-bound workloads (image processing, encryption, complex computation), blocking threads are fine because the thread is not waiting, it is working. The ride-hailing platform is I/O-bound: Redis, PostgreSQL, HTTP calls to other services. For I/O-bound workloads, reactive is the correct default.