Zero-Downtime Deployments and Rolling Update Resilience

A rolling update replaces pods one at a time. The old pod shuts down gracefully (Chapter 18) while the new pod starts. During the transition, both old and new pods are serving traffic. If the new pod is not ready when the old pod shuts down, there is a brief period with reduced capacity. If multiple pods are replaced simultaneously and several new pods are slow to start, the remaining old pods are overwhelmed.

Rolling Update Configuration

# PRODUCTION - Kubernetes deployment with resilience-aware rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1 # At most 1 pod unavailable at a time
      maxSurge: 1 # At most 1 extra pod during update
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: payment-service
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 5
            failureThreshold: 3
            # Pod is not considered ready until the readiness probe
            # succeeds 3 consecutive times (15 seconds of health).
            # This prevents routing traffic to a pod that is still
            # warming up (loading caches, establishing connections).

          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 3

          startupProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
            # Allow up to 155 seconds for startup
            # (5 + 30*5 = 155 seconds).
            # This covers JVM warm-up and cache pre-loading.

          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]

Readiness Gates for Warm-Up

A pod that passes the readiness probe but has cold caches and uninitialized connection pools will have high latency for the first few hundred requests. These requests hit the circuit breaker’s slow-call-duration-threshold, recording slow calls. If enough early requests are slow, the circuit breaker opens on the newly deployed pod.

// PRODUCTION - Warm-up readiness indicator
@Component
public class WarmUpReadinessIndicator
        implements ReactiveHealthIndicator {

    private final AtomicBoolean warmedUp = new AtomicBoolean(false);
    private final LoadingCache<String, BigDecimal> balanceCache;

    @EventListener(ApplicationReadyEvent.class)
    public void warmUp() {
        // Pre-populate critical caches
        log.info("Starting cache warm-up");
        List<String> frequentAccounts = loadFrequentAccounts();

        for (String accountId : frequentAccounts) {
            try {
                balanceCache.get(accountId);
            } catch (Exception e) {
                log.warn("Failed to warm cache for {}", accountId);
            }
        }

        warmedUp.set(true);
        log.info("Cache warm-up complete: {} accounts loaded",
                frequentAccounts.size());
    }

    @Override
    public Mono<Health> health() {
        if (warmedUp.get()) {
            return Mono.just(Health.up().build());
        }
        return Mono.just(Health.down()
                .withDetail("reason", "cache warm-up in progress")
                .build());
    }
}

The readiness probe includes the warm-up indicator. The pod only receives traffic after caches are populated. This prevents the cold-start penalty from affecting customer-facing requests.

Deployment Resilience Testing

// PRODUCTION - Test: simulate rolling update impact
@Test
void rollingUpdate_maintainsAvailability() throws Exception {
    // Simulate: 4 pods, 1 shutting down, 1 starting up
    // Available capacity: 2 out of 4 pods (50%)

    // Send traffic at the rate that 4 pods handle comfortably
    int requestsPerSecond = 200; // 50 per pod * 4 pods
    AtomicInteger successes = new AtomicInteger();
    AtomicInteger failures = new AtomicInteger();

    ExecutorService executor = Executors.newFixedThreadPool(50);

    // Run for 10 seconds (simulates the rolling update window)
    for (int second = 0; second < 10; second++) {
        for (int i = 0; i < requestsPerSecond; i++) {
            executor.submit(() -> {
                try {
                    ResponseEntity<PaymentResponse> response =
                            restTemplate.postForEntity("/payments",
                                    samplePayment(),
                                    PaymentResponse.class);
                    if (response.getStatusCode().is2xxSuccessful()) {
                        successes.incrementAndGet();
                    } else {
                        failures.incrementAndGet();
                    }
                } catch (Exception e) {
                    failures.incrementAndGet();
                }
            });
        }
        Thread.sleep(1000);
    }

    executor.shutdown();
    executor.awaitTermination(30, TimeUnit.SECONDS);

    // Success rate should be above 99% even during "rolling update"
    double successRate = (double) successes.get() /
            (successes.get() + failures.get());
    assertThat(successRate).isGreaterThan(0.99);
}

This test verifies that the service maintains its SLO during the rolling update window. In a real deployment, the test would be run in a staging environment with actual pod scaling. Here it validates the application-level behavior: graceful shutdown completing in-flight requests, warm-up delays preventing cold-start errors, and the overall availability remaining above the SLO threshold.