Skip to main content
resilience patterns in production

Pattern Composition

6 min read Chapter 19 of 40

Pattern Composition

Individual resilience patterns are well-defined. Combining them incorrectly creates emergent behaviors that are worse than having no resilience at all. A retry outside a circuit breaker retries even when the breaker is open. A bulkhead inside a time limiter can reject a call that still has time budget remaining. A rate limiter outside a retry counts each retry against the rate limit, reducing the effective throughput of successful calls.

The order matters. There is one correct order for the common case, and deviations from it require explicit justification.

The Correct Order

Resilience4J Decorator Ordering

The diagram shows the five resilience patterns as nested layers, outermost to innermost. Retry is outermost because it must retry the entire decorated chain: if the circuit breaker rejects a call, the retry should see that rejection and potentially retry later (after backoff) when the circuit breaker may be in half-open state. Circuit Breaker is second because it records the outcome of each attempt, including timeouts and bulkhead rejections. Rate Limiter is third because only calls that pass the circuit breaker should consume rate limiter tokens. Bulkhead is fourth because only calls that pass the rate limiter should compete for bulkhead permits. Time Limiter is innermost (closest to the actual call) because it enforces the timeout on the actual HTTP request.

The bottom warning shows what happens with wrong ordering: if the Circuit Breaker is outside Retry, retries bypass failure recording and the breaker never opens despite consistent failures.

Why This Order

Retry outside Circuit Breaker: When the circuit breaker opens, it throws CallNotPermittedException. The retry intercepts this exception. If the retry policy excludes CallNotPermittedException from retryable exceptions (and it should), the retry does not attempt again. The failure is recorded once in the circuit breaker, and the caller gets an immediate rejection.

If retry is inside the circuit breaker: each retry attempt is a separate call from the circuit breaker’s perspective. Three retries that all fail record three failures in the sliding window instead of one. The circuit breaker opens 3x faster than expected. This could be desirable in some cases, but the default should be one logical operation = one circuit breaker recording.

Circuit Breaker outside Rate Limiter: When the circuit breaker is open, calls are rejected without consuming a rate limiter token. If the circuit breaker were inside the rate limiter, rejected calls would still consume tokens, wasting rate limit capacity on calls that were never going to succeed.

Rate Limiter outside Bulkhead: The rate limiter controls the rate of calls. The bulkhead controls concurrency. Rate-limited calls that exceed the configured rate should never reach the bulkhead. If the bulkhead were outside the rate limiter, rejected bulkhead calls would be counted as rate-limited calls, creating confusion in metrics.

The Programmatic Composition

Resilience4J annotations apply decorators in a configurable order:

# PRODUCTION - application.yml
resilience4j:
  # Global decorator order (outermost to innermost)
  # This determines the order when multiple annotations are on the same method
  circuitbreaker:
    circuit-breaker-aspect-order: 1
  retry:
    retry-aspect-order: 2
  ratelimiter:
    rate-limiter-aspect-order: 3
  bulkhead:
    bulkhead-aspect-order: 4
  timelimiter:
    time-limiter-aspect-order: 5

Wait. The aspect order is the Spring AOP order, where lower numbers have higher priority and execute first (outermost). But the annotations are applied as aspects, and the outermost aspect executes first and is the last to return. For Resilience4J:

# PRODUCTION - Correct aspect ordering
# Lower number = higher priority = outermost
resilience4j:
  retry:
    retry-aspect-order: 1 # Outermost: retries everything
  circuitbreaker:
    circuit-breaker-aspect-order: 2
  ratelimiter:
    rate-limiter-aspect-order: 3
  bulkhead:
    bulkhead-aspect-order: 4
  timelimiter:
    time-limiter-aspect-order: 5 # Innermost: closest to the call

Alternatively, compose programmatically for full control:

// PRODUCTION - Programmatic composition
@Configuration
public class ResilienceConfig {

    @Bean
    public Supplier<FraudScore> resilientFraudCall(
            CircuitBreakerRegistry cbRegistry,
            RetryRegistry retryRegistry,
            BulkheadRegistry bulkheadRegistry,
            RateLimiterRegistry rlRegistry,
            TimeLimiterRegistry tlRegistry,
            FraudDetectionClient client) {

        CircuitBreaker cb = cbRegistry.circuitBreaker("fraudDetection");
        Retry retry = retryRegistry.retry("fraudDetection");
        Bulkhead bulkhead = bulkheadRegistry.bulkhead("fraudDetection");

        // Compose: Retry -> CircuitBreaker -> Bulkhead -> actual call
        // (RateLimiter and TimeLimiter applied separately)
        return Decorators.ofSupplier(() -> client.score(new PaymentRequest()))
                .withBulkhead(bulkhead)
                .withCircuitBreaker(cb)
                .withRetry(retry)
                .decorate();
        // Reading bottom-up: Bulkhead wraps the call,
        // CircuitBreaker wraps that, Retry wraps everything
    }
}

Complete Configuration Per Dependency

# PRODUCTION - Full resilience configuration for all dependencies
resilience4j:
  # --- Fraud Detection ---
  circuitbreaker:
    instances:
      fraudDetection:
        sliding-window-size: 100
        failure-rate-threshold: 50
        slow-call-rate-threshold: 80
        slow-call-duration-threshold: 500ms
        minimum-number-of-calls: 20
        wait-duration-in-open-state: 60s
        permitted-number-of-calls-in-half-open-state: 5
        automatic-transition-from-open-to-half-open-enabled: true

  retry:
    instances:
      fraudDetection:
        max-attempts: 1  # No retries for fraud - circuit breaker handles it
        # Fraud detection calls are idempotent (scoring is read-only),
        # but retrying a slow service delays the payment further.
        # The circuit breaker + fallback is the correct strategy.

  bulkhead:
    instances:
      fraudDetection:
        max-concurrent-calls: 20
        max-wait-duration: 100ms

  timelimiter:
    instances:
      fraudDetection:
        timeout-duration: 2s
        cancel-running-future: true

  # --- Payment Gateway ---
  circuitbreaker:
    instances:
      paymentGateway:
        sliding-window-size: 50
        failure-rate-threshold: 60
        # Higher threshold: payment gateway errors are more common
        # (network issues with external provider)
        minimum-number-of-calls: 10
        wait-duration-in-open-state: 30s
        permitted-number-of-calls-in-half-open-state: 3

  retry:
    instances:
      paymentGateway:
        max-attempts: 3
        wait-duration: 200ms
        enable-exponential-backoff: true
        exponential-backoff-multiplier: 2.0
        enable-randomized-wait: true
        retry-exceptions:
          - java.io.IOException
          - java.util.concurrent.TimeoutException

  bulkhead:
    instances:
      paymentGateway:
        max-concurrent-calls: 120
        max-wait-duration: 200ms

  timelimiter:
    instances:
      paymentGateway:
        timeout-duration: 8s
        cancel-running-future: true

Testing the Composed Stack

// PRODUCTION - Test verifying the full decorator chain
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class ComposedResilienceTest {

    @Container
    static GenericContainer<?> wireMock = new GenericContainer<>(
            DockerImageName.parse("wiremock/wiremock:latest"))
            .withExposedPorts(8080);

    @Autowired
    private FraudDetectionService fraudService;

    @Autowired
    private CircuitBreakerRegistry cbRegistry;

    @Autowired
    private BulkheadRegistry bulkheadRegistry;

    @Test
    void fullChain_circuitBreakerOpens_bulkheadNotConsumed() {
        // Cause circuit breaker to open
        forceCircuitBreakerOpen();

        // Verify bulkhead permits are NOT consumed when circuit breaker is open
        Bulkhead bulkhead = bulkheadRegistry.bulkhead("fraudDetection");
        int permitsBefore = bulkhead.getMetrics().getAvailableConcurrentCalls();

        try {
            fraudService.checkFraud(samplePayment());
        } catch (Exception ignored) {}

        int permitsAfter = bulkhead.getMetrics().getAvailableConcurrentCalls();

        // Permits should be unchanged: CB rejected before reaching bulkhead
        assertThat(permitsAfter).isEqualTo(permitsBefore);
    }
}

This test verifies that the decorator ordering is correct. When the circuit breaker is open, the call is rejected before reaching the bulkhead. No bulkhead permit is consumed. If the ordering were wrong (bulkhead outside circuit breaker), the permit would be acquired and immediately released, adding unnecessary overhead and potentially causing false bulkhead-full rejections under high concurrency.