Skip to main content
surviving the spike

The Sticky Session Trap and Connection Draining

9 min read Chapter 42 of 66

The Sticky Session Trap and Connection Draining

The Symptom

The rider API runs 12 pods with cookie-based sticky sessions. The Nginx Ingress controller sets a RIDER_AFFINITY cookie with a 1-hour TTL, pinning each rider to a specific pod.

Pod traffic distribution during Friday evening peak:

Pod      RPS       % of Total    Active Sessions
pod-0    380       7.3%          1,200
pod-1    410       7.9%          1,340
pod-2    395       7.6%          1,280
pod-3    1,560     30.0%         5,100
pod-4    360       6.9%          1,180
pod-5    440       8.5%          1,420
pod-6    375       7.2%          1,210
pod-7    390       7.5%          1,260
pod-8    355       6.8%          1,150
pod-9    280       5.4%          920
pod-10   420       8.1%          1,370
pod-11   335       6.4%          1,090

Pod-3 handles 30% of all traffic. The other 11 pods share the remaining 70%. Pod-3’s CPU is at 68%. Its connection pool utilization is at 92%. Its p99 latency is 890ms while the fleet average is 165ms.

Why pod-3? The platform’s most active riders (frequent business travelers, daily commuters) generate disproportionate traffic. They open the app 40+ times per day. Their long-lived affinity cookies stick them to whichever pod they first landed on. Pod-3 accumulated a cluster of power users through random initial assignment. The sticky session cookie ensures they never redistribute.

During a rolling deployment, pod-3 restarts. Its 5,100 sessions lose their affinity cookie. The 1,560 RPS redistributes across the remaining 11 pods. Each pod’s load increases by ~142 RPS. Pod-4, which was handling 360 RPS, jumps to 502 RPS. Several pods cross their connection pool threshold. The error rate spikes to 12% for 45 seconds until the new pod-3 starts and begins accepting sessions.

The Cause

Sticky sessions create a positive feedback loop. A pod that accumulates more sessions handles more traffic. More traffic means higher resource utilization. Higher utilization means longer response times. Longer response times mean connections are held longer. The pod cannot shed load because the affinity cookie bypasses the load balancing algorithm. Least connections would route traffic away from a busy pod, but the sticky session cookie overrides it.

The ride-hailing team added sticky sessions in month 2 of the project. The rider API stored authentication tokens and user preferences in an HttpSession object backed by an in-memory ConcurrentHashMap. Without sticky sessions, a rider’s second request might hit a different pod that had no session data. The rider would be forced to re-authenticate.

In month 5, the team migrated session storage to Redis (CH3). Every pod reads the same session data from the same Redis cluster. The sticky session configuration was not removed. Nobody tested what happens without it because the cookie was invisible. It kept working. It kept creating skew.

The second problem: connection draining during deployments. When Kubernetes terminates a pod during a rolling update, it sends SIGTERM. By default, the pod has 30 seconds (terminationGracePeriodSeconds) to shut down. If the pod does not implement graceful shutdown, SIGTERM kills the process immediately. In-flight requests receive a connection reset. The client sees a 502.

Spring Boot has graceful shutdown support, but it is disabled by default.

The Baseline

Impact of sticky sessions:

Metric                    With Sticky    Without Sticky    Delta
Max pod traffic ratio     5.6:1          1.1:1             -82%
p99 (busiest pod)         890ms          175ms             -80%
Connection pool max       92%            45%               -51%
Rolling deploy errors     12%            0.3%              -97%
Pod failure blast radius  30% of users   8.3% of users     -72%

Connection draining during rolling deployment:

Metric                    No Draining    With Draining     Delta
502 errors during deploy  847            12                -99%
Request failures          3.2%           0.04%             -99%
Deploy duration           45s            90s               +100%
In-flight request loss    ~430           ~3                -99%

The Fix

Eliminating sticky sessions

Step 1: Verify all session state is in Redis. Query the codebase for local state:

grep -r "HttpSession\|@SessionScope\|SessionAttribute\|ConcurrentHashMap" \
  --include="*.java" src/main/java/

If any results appear, migrate that state to Redis before removing sticky sessions.

Step 2: Configure Spring Session with Redis:

// SCALED: Redis-backed session configuration
@Configuration
@EnableRedisHttpSession(maxInactiveIntervalInSeconds = 3600)
public class SessionConfig {

    @Bean
    public LettuceConnectionFactory connectionFactory() {
        RedisStandaloneConfiguration config =
            new RedisStandaloneConfiguration("redis.ridehailing.internal", 6379);
        return new LettuceConnectionFactory(config);
    }

    @Bean
    public CookieSerializer cookieSerializer() {
        DefaultCookieSerializer serializer = new DefaultCookieSerializer();
        serializer.setCookieName("RIDER_SESSION");
        serializer.setSameSite("Lax");
        serializer.setUseSecureCookie(true);
        return serializer;
    }
}

Step 3: Remove sticky session annotations from the Ingress:

# BOTTLENECK: Remove these annotations
# nginx.ingress.kubernetes.io/affinity: "cookie"
# nginx.ingress.kubernetes.io/session-cookie-name: "RIDER_AFFINITY"
# nginx.ingress.kubernetes.io/session-cookie-max-age: "3600"

# SCALED: Clean Ingress with least-connections
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rider-api
  namespace: ridehailing
  annotations:
    nginx.ingress.kubernetes.io/load-balance: "least_conn"
spec:
  rules:
    - host: rider-api.ridehailing.internal
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: rider-api
                port:
                  number: 8080

Step 4: Deploy and monitor. Watch the traffic distribution flatten across pods within minutes as riders’ next requests route via least-connections instead of cookie affinity.

Connection draining with preStop hook

When Kubernetes sends SIGTERM to a pod during a rolling update, three things must happen in order:

  1. The pod is removed from the Service’s endpoint list (no new traffic)
  2. In-flight requests complete
  3. The pod shuts down

The problem: step 1 is asynchronous. Kubernetes updates the endpoint list, but the Ingress controller and kube-proxy take time to propagate the change. For 1-5 seconds after SIGTERM, new requests may still arrive at the terminating pod. If the pod shuts down immediately on SIGTERM, those requests get connection resets.

The preStop hook adds a delay:

# SCALED: Pod spec with connection draining
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: rider-api
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 10"]
      ports:
        - containerPort: 8080

The timeline:

T+0s     Kubernetes sends SIGTERM, starts removing pod from endpoints
T+0s     preStop hook starts: sleep 10
T+1-5s   Endpoint propagation completes, no new traffic reaches pod
T+10s    preStop hook ends, SIGTERM delivered to the application
T+10s    Spring Boot begins graceful shutdown
T+10-40s In-flight requests complete, new requests rejected with 503
T+40s    Application exits cleanly
T+60s    terminationGracePeriodSeconds expires (hard kill if still running)

Spring Boot graceful shutdown

# SCALED: application.yml for graceful shutdown
server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

management:
  endpoint:
    health:
      probes:
        enabled: true

When server.shutdown=graceful is set, Spring Boot on SIGTERM:

  1. Stops accepting new connections
  2. Returns 503 for new requests on existing connections
  3. Waits for in-flight requests to complete (up to timeout-per-shutdown-phase)
  4. Closes the Netty event loop
  5. Destroys the Spring application context

The 30-second timeout handles requests that are mid-database-transaction. A ride request that is partway through fare calculation, driver matching, and payment authorization may take 5-10 seconds to complete. 30 seconds gives generous headroom.

// SCALED: Graceful shutdown listener for cleanup
@Component
public class ShutdownListener {

    private final DriverLocationSseController sseController;

    @EventListener(ContextClosedEvent.class)
    public void onShutdown(ContextClosedEvent event) {
        // Close all SSE connections gracefully
        sseController.closeAllConnections();
        // Deregister from service discovery
        // Flush metrics buffers
    }
}

Locust: rolling deployment error comparison

# SCALED: Locust test for rolling deployment error rate
from locust import HttpUser, task, between, events
import time

class RollingDeployUser(HttpUser):
    wait_time = between(0.1, 0.3)

    @task
    def fare_estimate(self):
        with self.client.get(
            "/api/rides/fare-estimate",
            params={
                "pickup_lat": 40.7128, "pickup_lng": -74.0060,
                "dropoff_lat": 40.7589, "dropoff_lng": -73.9851
            },
            catch_response=True
        ) as response:
            if response.status_code == 502:
                response.failure("502 during rolling deploy")
            elif response.status_code == 503:
                response.failure("503 during rolling deploy")

Test procedure:

# Terminal 1: Start Locust
locust -f locust_deploy_test.py \
  --host=https://rider-api.ridehailing.internal \
  --users 5000 --spawn-rate 500 \
  --run-time 300s --headless --csv=deploy_test

# Terminal 2: Trigger rolling deployment 30 seconds after Locust starts
sleep 30 && kubectl set image deployment/rider-api \
  rider-api=rider-api:v2.1.0 --namespace=ridehailing

Results without graceful shutdown:

Time        Event                     RPS     Error Rate
T+0-30s     Baseline                  5,000   0.02%
T+30s       Deploy starts             5,000   0.02%
T+32s       First pod terminating     5,000   3.8%
T+35s       Second pod terminating    5,000   6.2%
T+38s       First new pod ready       5,000   4.1%
T+45s       Rolling continues         5,000   2.8%
T+75s       Deploy complete           5,000   0.02%

Results with preStop hook and graceful shutdown:

Time        Event                     RPS     Error Rate
T+0-30s     Baseline                  5,000   0.02%
T+30s       Deploy starts             5,000   0.02%
T+32s       First pod draining        5,000   0.03%
T+42s       First pod terminated      5,000   0.04%
T+45s       First new pod ready       5,000   0.03%
T+90s       Deploy complete           5,000   0.02%

The deployment takes longer (90 seconds vs 75 seconds) because each pod spends 10 seconds in the preStop sleep plus up to 30 seconds draining connections. The error rate stays below 0.05% throughout. No connection resets. No 502s. The trade: 15 seconds of additional deployment time buys a 99% reduction in deployment-related errors.

Rolling update strategy

# SCALED: Deployment with tuned rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rider-api
  namespace: ridehailing
spec:
  replicas: 12
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: rider-api
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 10"]

maxUnavailable: 0 ensures no capacity reduction during the deployment. At least 12 pods are always serving traffic. maxSurge: 25% allows up to 3 extra pods (15 total) during the transition. New pods start before old pods terminate. The cluster temporarily runs more pods than usual, consuming more resources, but the service never drops below full capacity.

The Proof

After removing sticky sessions and implementing connection draining:

Metric                        Before           After            Delta
Traffic skew (max/min pod)    5.6:1            1.1:1            -80%
p99 during normal ops         890ms (skewed)   175ms            -80%
Rolling deploy error rate     12%              0.04%            -99.7%
Pod failure blast radius      30% of users     8.3% of users    -72%
502 errors per deploy         847              3                -99.6%
Deploy duration               45s              90s              +100%

The deployment takes twice as long. That is the correct trade. 45 seconds of fast, error-prone deployments cost the platform 847 failed requests per deploy. With 4 deploys per day, that is 3,388 failed requests daily. At 90-second deploys with connection draining, the daily failure count drops to 12. The extra 45 seconds per deploy is invisible to users. The 847 errors per deploy were not.