Graceful Shutdown and Drain

A JVM killed with kill -9 drops every in-flight request. HTTP connections are reset. Kafka consumers stop without committing offsets, causing reprocessing on restart. Database connections are abandoned without returning to the pool. Distributed locks are held until they expire. Each abandoned operation becomes an error for the caller, a retry, or a duplicate.

A graceful shutdown is a resilience pattern applied to the service itself: it prevents the service’s own lifecycle events from becoming failure modes for its callers and consumers.

The Shutdown Sequence

Graceful Shutdown Sequence

The diagram shows the five phases of graceful shutdown:

Phase 1: Stop accepting new work. The load balancer health check returns unhealthy. The Kubernetes readiness probe fails. New requests are routed to other instances. The service stops polling Kafka topics.

Phase 2: Complete in-flight work. Existing HTTP requests continue processing. In-flight database transactions complete. Current Kafka message processing finishes and offsets are committed.

Phase 3: Drain connections. HTTP keep-alive connections are closed after the current request completes. WebSocket connections receive a close frame. Database connection pool drains. gRPC connections receive a GOAWAY frame.

Phase 4: Deregister from service discovery. The service instance is removed from the service registry (Eureka, Consul, Kubernetes endpoints). Other services stop sending requests to this instance.

Phase 5: Process exit. The JVM shuts down. Resources are released. The exit code is 0 (clean shutdown).

Spring Boot Graceful Shutdown

# PRODUCTION - Spring Boot graceful shutdown configuration
server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s
    # Maximum time to wait for in-flight requests to complete.
    # After 30 seconds, remaining requests are forcibly terminated.

Spring Boot’s shutdown: graceful implements phases 1-3:

The embedded Tomcat stops accepting new connections.
Active requests continue processing up to the timeout.
After all active requests complete (or the timeout fires), the web server shuts down.

// PRODUCTION - Custom shutdown hooks for additional cleanup
@Component
public class GracefulShutdownHandler {

    private final CircuitBreakerRegistry cbRegistry;
    private final KafkaListenerEndpointRegistry kafkaRegistry;
    private final ScheduledExecutorService scheduler;

    @PreDestroy
    public void onShutdown() {
        log.info("Graceful shutdown initiated");

        // Phase 1: Stop Kafka consumers
        kafkaRegistry.getAllListenerContainers().forEach(container -> {
            log.info("Stopping Kafka listener: {}",
                    container.getListenerId());
            container.stop();
        });

        // Phase 2: Wait for in-flight Kafka processing
        // Spring's lifecycle timeout handles this

        // Phase 3: Close circuit breakers (prevent new calls)
        cbRegistry.getAllCircuitBreakers().forEach(cb -> {
            cb.transitionToForcedOpenState();
            log.info("Force-opened circuit breaker: {}", cb.getName());
        });

        // Phase 4: Shut down scheduled tasks
        scheduler.shutdown();
        try {
            if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) {
                scheduler.shutdownNow();
            }
        } catch (InterruptedException e) {
            scheduler.shutdownNow();
            Thread.currentThread().interrupt();
        }

        log.info("Graceful shutdown complete");
    }
}

Kubernetes Shutdown Coordination

Kubernetes sends SIGTERM to the pod, waits for terminationGracePeriodSeconds (default 30), then sends SIGKILL. The challenge: Kubernetes removes the pod from the Service endpoints and sends SIGTERM concurrently. There is a race condition: the pod may receive new requests after SIGTERM because the endpoints update has not propagated to all kube-proxy instances.

# PRODUCTION - Kubernetes pod spec with shutdown coordination
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: payment-service
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]
            # Wait 5 seconds before starting shutdown.
            # This gives kube-proxy time to update endpoints.
            # Without this sleep, requests arrive during shutdown.

The preStop sleep is a workaround for the Kubernetes race condition. The timeline:

Kubernetes sends SIGTERM and begins endpoint removal (concurrent)
preStop hook runs: sleep 5
During the 5-second sleep, kube-proxy updates propagate, stopping new requests
After 5 seconds, the application receives SIGTERM and begins graceful shutdown
In-flight requests complete (up to 30 seconds)
Total shutdown time: 5s (preStop) + 30s (graceful) = 35s < 60s (terminationGracePeriodSeconds)

# PRODUCTION - Spring Boot shutdown timeout must fit within Kubernetes budget
# terminationGracePeriodSeconds (60) > preStop (5) + shutdown-timeout (30) + buffer (25)
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Kafka Consumer Shutdown

Stopping a Kafka consumer triggers a consumer group rebalance. Partitions assigned to the stopping consumer are reassigned to other consumers in the group. During the rebalance, all consumers in the group stop processing (stop-the-world rebalance in the default eager protocol).

// PRODUCTION - Cooperative sticky Kafka consumer for fast rebalance
@Configuration
public class KafkaConsumerConfig {

    @Bean
    public ConsumerFactory<String, PaymentEvent> consumerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
                "kafka:9092");
        config.put(ConsumerConfig.GROUP_ID_CONFIG,
                "fraud-check-consumer");

        // Cooperative sticky assignor: incremental rebalance
        // Only the partitions from the leaving consumer are reassigned.
        // Other consumers continue processing their partitions.
        config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
                CooperativeStickyAssignor.class.getName());

        // Session timeout: how long before the broker considers
        // this consumer dead. Must be greater than max.poll.interval.
        config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);

        return new DefaultKafkaConsumerFactory<>(config);
    }
}

The cooperative sticky assignor (Kafka 2.4+) performs an incremental rebalance. When one consumer leaves the group, only its partitions are reassigned. Other consumers continue processing without interruption. This reduces the rebalance impact from “all consumers pause” to “one consumer’s partitions are briefly unprocessed.”

Testing Graceful Shutdown

// PRODUCTION - Verify in-flight requests complete during shutdown
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class GracefulShutdownTest {

    @Autowired
    private ConfigurableApplicationContext context;

    @LocalServerPort
    private int port;

    @Test
    void inFlightRequestCompletes_duringShutdown() throws Exception {
        // Configure fraud service to respond slowly (simulate in-flight)
        fraudWireMock().register(
                WireMock.post("/fraud/score")
                        .willReturn(WireMock.okJson(
                                "{\"score\":0.1,\"decision\":\"PERMIT\"}")
                                .withFixedDelay(3000)));

        // Send a request that will be in-flight during shutdown
        CompletableFuture<ResponseEntity<PaymentResponse>> inFlight =
                CompletableFuture.supplyAsync(() ->
                        new RestTemplate().postForEntity(
                                "http://localhost:" + port + "/payments",
                                samplePayment(),
                                PaymentResponse.class));

        // Wait for the request to reach the server
        Thread.sleep(500);

        // Initiate shutdown while request is in flight
        context.close();

        // The in-flight request should complete successfully
        ResponseEntity<PaymentResponse> response = inFlight.get(
                10, TimeUnit.SECONDS);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
    }

    @Test
    void newRequestsRejected_afterShutdownInitiated() {
        context.close();

        // New request should fail (connection refused)
        assertThatThrownBy(() ->
                new RestTemplate().postForEntity(
                        "http://localhost:" + port + "/payments",
                        samplePayment(),
                        PaymentResponse.class))
                .isInstanceOf(ResourceAccessException.class);
    }
}

The first test verifies that a request that is already being processed when shutdown begins will complete successfully. The second test verifies that new requests are rejected after shutdown starts. Together, they confirm the two fundamental properties of graceful shutdown: finish what you started, refuse what you have not.