Graceful Shutdown and Drain
Graceful Shutdown and Drain
A JVM killed with kill -9 drops every in-flight request. HTTP connections are reset. Kafka consumers stop without committing offsets, causing reprocessing on restart. Database connections are abandoned without returning to the pool. Distributed locks are held until they expire. Each abandoned operation becomes an error for the caller, a retry, or a duplicate.
A graceful shutdown is a resilience pattern applied to the service itself: it prevents the service’s own lifecycle events from becoming failure modes for its callers and consumers.
The Shutdown Sequence
The diagram shows the five phases of graceful shutdown:
Phase 1: Stop accepting new work. The load balancer health check returns unhealthy. The Kubernetes readiness probe fails. New requests are routed to other instances. The service stops polling Kafka topics.
Phase 2: Complete in-flight work. Existing HTTP requests continue processing. In-flight database transactions complete. Current Kafka message processing finishes and offsets are committed.
Phase 3: Drain connections. HTTP keep-alive connections are closed after the current request completes. WebSocket connections receive a close frame. Database connection pool drains. gRPC connections receive a GOAWAY frame.
Phase 4: Deregister from service discovery. The service instance is removed from the service registry (Eureka, Consul, Kubernetes endpoints). Other services stop sending requests to this instance.
Phase 5: Process exit. The JVM shuts down. Resources are released. The exit code is 0 (clean shutdown).
Spring Boot Graceful Shutdown
# PRODUCTION - Spring Boot graceful shutdown configuration
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
# Maximum time to wait for in-flight requests to complete.
# After 30 seconds, remaining requests are forcibly terminated.
Spring Boot’s shutdown: graceful implements phases 1-3:
- The embedded Tomcat stops accepting new connections.
- Active requests continue processing up to the timeout.
- After all active requests complete (or the timeout fires), the web server shuts down.
// PRODUCTION - Custom shutdown hooks for additional cleanup
@Component
public class GracefulShutdownHandler {
private final CircuitBreakerRegistry cbRegistry;
private final KafkaListenerEndpointRegistry kafkaRegistry;
private final ScheduledExecutorService scheduler;
@PreDestroy
public void onShutdown() {
log.info("Graceful shutdown initiated");
// Phase 1: Stop Kafka consumers
kafkaRegistry.getAllListenerContainers().forEach(container -> {
log.info("Stopping Kafka listener: {}",
container.getListenerId());
container.stop();
});
// Phase 2: Wait for in-flight Kafka processing
// Spring's lifecycle timeout handles this
// Phase 3: Close circuit breakers (prevent new calls)
cbRegistry.getAllCircuitBreakers().forEach(cb -> {
cb.transitionToForcedOpenState();
log.info("Force-opened circuit breaker: {}", cb.getName());
});
// Phase 4: Shut down scheduled tasks
scheduler.shutdown();
try {
if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) {
scheduler.shutdownNow();
}
} catch (InterruptedException e) {
scheduler.shutdownNow();
Thread.currentThread().interrupt();
}
log.info("Graceful shutdown complete");
}
}
Kubernetes Shutdown Coordination
Kubernetes sends SIGTERM to the pod, waits for terminationGracePeriodSeconds (default 30), then sends SIGKILL. The challenge: Kubernetes removes the pod from the Service endpoints and sends SIGTERM concurrently. There is a race condition: the pod may receive new requests after SIGTERM because the endpoints update has not propagated to all kube-proxy instances.
# PRODUCTION - Kubernetes pod spec with shutdown coordination
spec:
terminationGracePeriodSeconds: 60
containers:
- name: payment-service
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
# Wait 5 seconds before starting shutdown.
# This gives kube-proxy time to update endpoints.
# Without this sleep, requests arrive during shutdown.
The preStop sleep is a workaround for the Kubernetes race condition. The timeline:
- Kubernetes sends SIGTERM and begins endpoint removal (concurrent)
preStophook runs:sleep 5- During the 5-second sleep, kube-proxy updates propagate, stopping new requests
- After 5 seconds, the application receives SIGTERM and begins graceful shutdown
- In-flight requests complete (up to 30 seconds)
- Total shutdown time: 5s (preStop) + 30s (graceful) = 35s < 60s (terminationGracePeriodSeconds)
# PRODUCTION - Spring Boot shutdown timeout must fit within Kubernetes budget
# terminationGracePeriodSeconds (60) > preStop (5) + shutdown-timeout (30) + buffer (25)
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
Kafka Consumer Shutdown
Stopping a Kafka consumer triggers a consumer group rebalance. Partitions assigned to the stopping consumer are reassigned to other consumers in the group. During the rebalance, all consumers in the group stop processing (stop-the-world rebalance in the default eager protocol).
// PRODUCTION - Cooperative sticky Kafka consumer for fast rebalance
@Configuration
public class KafkaConsumerConfig {
@Bean
public ConsumerFactory<String, PaymentEvent> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"kafka:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG,
"fraud-check-consumer");
// Cooperative sticky assignor: incremental rebalance
// Only the partitions from the leaving consumer are reassigned.
// Other consumers continue processing their partitions.
config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
CooperativeStickyAssignor.class.getName());
// Session timeout: how long before the broker considers
// this consumer dead. Must be greater than max.poll.interval.
config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);
return new DefaultKafkaConsumerFactory<>(config);
}
}
The cooperative sticky assignor (Kafka 2.4+) performs an incremental rebalance. When one consumer leaves the group, only its partitions are reassigned. Other consumers continue processing without interruption. This reduces the rebalance impact from “all consumers pause” to “one consumer’s partitions are briefly unprocessed.”
Testing Graceful Shutdown
// PRODUCTION - Verify in-flight requests complete during shutdown
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class GracefulShutdownTest {
@Autowired
private ConfigurableApplicationContext context;
@LocalServerPort
private int port;
@Test
void inFlightRequestCompletes_duringShutdown() throws Exception {
// Configure fraud service to respond slowly (simulate in-flight)
fraudWireMock().register(
WireMock.post("/fraud/score")
.willReturn(WireMock.okJson(
"{\"score\":0.1,\"decision\":\"PERMIT\"}")
.withFixedDelay(3000)));
// Send a request that will be in-flight during shutdown
CompletableFuture<ResponseEntity<PaymentResponse>> inFlight =
CompletableFuture.supplyAsync(() ->
new RestTemplate().postForEntity(
"http://localhost:" + port + "/payments",
samplePayment(),
PaymentResponse.class));
// Wait for the request to reach the server
Thread.sleep(500);
// Initiate shutdown while request is in flight
context.close();
// The in-flight request should complete successfully
ResponseEntity<PaymentResponse> response = inFlight.get(
10, TimeUnit.SECONDS);
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
}
@Test
void newRequestsRejected_afterShutdownInitiated() {
context.close();
// New request should fail (connection refused)
assertThatThrownBy(() ->
new RestTemplate().postForEntity(
"http://localhost:" + port + "/payments",
samplePayment(),
PaymentResponse.class))
.isInstanceOf(ResourceAccessException.class);
}
}
The first test verifies that a request that is already being processed when shutdown begins will complete successfully. The second test verifies that new requests are rejected after shutdown starts. Together, they confirm the two fundamental properties of graceful shutdown: finish what you started, refuse what you have not.