Skip to main content
resilience patterns in production

Degraded Mode Design

6 min read Chapter 35 of 40

Degraded Mode Design

Every system degrades under failure. The question is whether it degrades in a way you designed or in a way you discover during an incident. Uncontrolled degradation means random functionality breaks in unpredictable order. Controlled degradation means you decided in advance which functionality to sacrifice, in what order, and under what conditions.

The Failure Mode: Uncontrolled Degradation

The payment service has four dependencies. When one fails, the circuit breaker opens and the fallback activates. When two fail, two fallbacks activate. When three fail, the system is running on three fallbacks simultaneously. Nobody designed this state. Nobody tested it. The fraud check is using a cached score, the balance check is using a stale value, and the notification service is dropping messages. Is this combination safe? Can a stale balance and a cached fraud score together produce an incorrect payment authorization?

Uncontrolled degradation compounds: each fallback was designed and tested in isolation. The combination of multiple fallbacks may produce behavior that no individual fallback intended.

Degraded Mode Taxonomy

Degraded Mode Decision Matrix

The transaction platform has four explicit degraded modes:

Mode 0: Normal. All dependencies available. Full fraud checking, real-time balance, synchronous payment gateway, real-time notifications.

Mode 1: Degraded Fraud. Fraud detection unavailable. Fallback fraud scores are used. Low-risk payments proceed normally. High-value payments (above $5,000) are held for manual review instead of proceeding with a fallback score. Notification service sends a “pending review” notification instead of a confirmation.

Mode 2: Degraded Balance. Balance service unavailable. Cached balances are used. Payments within 80% of the cached balance proceed (safety margin for balance changes). Payments exceeding the cached balance are rejected (cannot verify sufficient funds).

Mode 3: Emergency. Multiple dependencies unavailable or error budget exhausted. Only pre-authorized recurring payments proceed. All new payment requests receive a 503 with a Retry-After header. The system is explicitly refusing work to protect its remaining capacity for critical operations.

The Decision Matrix

// PRODUCTION - Degraded mode controller
@Component
public class DegradedModeController {

    private final CircuitBreakerRegistry cbRegistry;
    private final ErrorBudgetCalculator budgetCalculator;

    public enum OperatingMode {
        NORMAL, DEGRADED_FRAUD, DEGRADED_BALANCE, EMERGENCY
    }

    public OperatingMode currentMode() {
        boolean fraudAvailable = cbRegistry
                .circuitBreaker("fraudDetection")
                .getState() != CircuitBreaker.State.OPEN;

        boolean balanceAvailable = cbRegistry
                .circuitBreaker("balanceCheck")
                .getState() != CircuitBreaker.State.OPEN;

        boolean gatewayAvailable = cbRegistry
                .circuitBreaker("paymentGateway")
                .getState() != CircuitBreaker.State.OPEN;

        double errorBudget = budgetCalculator.remainingBudget();

        // Emergency mode: gateway down, or multiple dependencies down,
        // or error budget exhausted
        if (!gatewayAvailable || errorBudget < 0.1) {
            return OperatingMode.EMERGENCY;
        }

        if (!fraudAvailable && !balanceAvailable) {
            return OperatingMode.EMERGENCY;
        }

        if (!fraudAvailable) {
            return OperatingMode.DEGRADED_FRAUD;
        }

        if (!balanceAvailable) {
            return OperatingMode.DEGRADED_BALANCE;
        }

        return OperatingMode.NORMAL;
    }
}

Mode-Specific Payment Processing

// PRODUCTION - Payment processor with mode-aware behavior
@Service
public class PaymentProcessor {

    private final DegradedModeController modeController;
    private final FraudDetectionService fraudService;
    private final BalanceService balanceService;
    private final PaymentGatewayClient gatewayClient;
    private final MeterRegistry meterRegistry;

    public PaymentResult processPayment(PaymentRequest request) {
        OperatingMode mode = modeController.currentMode();

        meterRegistry.counter("payment.mode",
                "mode", mode.name()).increment();

        return switch (mode) {
            case NORMAL -> processNormal(request);
            case DEGRADED_FRAUD -> processDegradedFraud(request);
            case DEGRADED_BALANCE -> processDegradedBalance(request);
            case EMERGENCY -> processEmergency(request);
        };
    }

    private PaymentResult processDegradedFraud(PaymentRequest request) {
        // High-value payments are held for manual review
        if (request.amount().compareTo(new BigDecimal("5000")) > 0) {
            return PaymentResult.held(request.paymentId(),
                    "Manual review required: fraud detection unavailable");
        }

        // Low-risk payments proceed with fallback fraud score
        FraudScore fallbackScore = FraudScore.defaultPermit(request);
        BigDecimal balance = balanceService.checkBalance(
                request.accountId());

        if (balance.compareTo(request.amount()) < 0) {
            return PaymentResult.declined(request.paymentId(),
                    "Insufficient funds");
        }

        GatewayResponse response = gatewayClient.charge(request);
        return PaymentResult.approved(request.paymentId(),
                response.transactionId(), "DEGRADED_FRAUD");
    }

    private PaymentResult processDegradedBalance(PaymentRequest request) {
        FraudScore score = fraudService.checkFraud(request);
        if (score.decision() == Decision.BLOCK) {
            return PaymentResult.declined(request.paymentId(),
                    "Fraud check failed");
        }

        // Use cached balance with safety margin
        BigDecimal cachedBalance = balanceService
                .getCachedBalance(request.accountId());

        if (cachedBalance == null) {
            return PaymentResult.declined(request.paymentId(),
                    "Balance unavailable: no cached value");
        }

        // Apply 80% safety margin
        BigDecimal safeBalance = cachedBalance
                .multiply(new BigDecimal("0.80"));

        if (safeBalance.compareTo(request.amount()) < 0) {
            return PaymentResult.declined(request.paymentId(),
                    "Balance check: insufficient confirmed funds");
        }

        GatewayResponse response = gatewayClient.charge(request);
        return PaymentResult.approved(request.paymentId(),
                response.transactionId(), "DEGRADED_BALANCE");
    }

    private PaymentResult processEmergency(PaymentRequest request) {
        // Only pre-authorized recurring payments proceed
        if (!request.isRecurring() || !request.isPreAuthorized()) {
            return PaymentResult.rejected(request.paymentId(),
                    "Service temporarily unavailable",
                    Duration.ofMinutes(5));
            // Includes Retry-After header
        }

        // Recurring payments bypass fraud and balance checks
        GatewayResponse response = gatewayClient.charge(request);
        return PaymentResult.approved(request.paymentId(),
                response.transactionId(), "EMERGENCY");
    }
}

Testing Degraded Modes

// PRODUCTION - Test each degraded mode independently
class DegradedModeTest extends ResilienceTestBase {

    @Autowired
    private PaymentProcessor paymentProcessor;

    @Test
    void degradedFraud_highValuePaymentHeld() {
        // Open fraud detection circuit breaker
        cbRegistry.circuitBreaker("fraudDetection")
                .transitionToOpenState();

        PaymentRequest highValue = new PaymentRequest(
                "PAY-001", "ACC-123",
                new BigDecimal("10000.00"), false, false);

        PaymentResult result = paymentProcessor.processPayment(highValue);

        assertThat(result.status()).isEqualTo(PaymentStatus.HELD);
        assertThat(result.reason())
                .contains("Manual review required");
    }

    @Test
    void degradedFraud_lowValuePaymentProceeds() {
        cbRegistry.circuitBreaker("fraudDetection")
                .transitionToOpenState();

        PaymentRequest lowValue = new PaymentRequest(
                "PAY-002", "ACC-123",
                new BigDecimal("50.00"), false, false);

        // Configure balance service to return sufficient balance
        balanceWireMock().register(
                WireMock.get(urlPathMatching("/accounts/.*/balance"))
                        .willReturn(WireMock.okJson(
                                "{\"balance\":1000.00}")));

        PaymentResult result = paymentProcessor.processPayment(lowValue);

        assertThat(result.status()).isEqualTo(PaymentStatus.APPROVED);
        assertThat(result.mode()).isEqualTo("DEGRADED_FRAUD");
    }

    @Test
    void emergency_rejectsNewPayments() {
        // Open payment gateway circuit breaker -> emergency mode
        cbRegistry.circuitBreaker("paymentGateway")
                .transitionToOpenState();

        PaymentRequest request = new PaymentRequest(
                "PAY-003", "ACC-123",
                new BigDecimal("100.00"), false, false);

        PaymentResult result = paymentProcessor.processPayment(request);

        assertThat(result.status()).isEqualTo(PaymentStatus.REJECTED);
        assertThat(result.retryAfter()).isNotNull();
    }

    @Test
    void emergency_allowsRecurringPreauthorized() {
        cbRegistry.circuitBreaker("paymentGateway")
                .transitionToOpenState();

        // But gateway still works (breaker is open based on recent failures,
        // but the half-open probe might succeed)
        cbRegistry.circuitBreaker("paymentGateway")
                .transitionToClosedState();

        PaymentRequest recurring = new PaymentRequest(
                "PAY-004", "ACC-123",
                new BigDecimal("29.99"), true, true);

        PaymentResult result = paymentProcessor.processPayment(recurring);
        assertThat(result.status()).isEqualTo(PaymentStatus.APPROVED);
    }
}

The Observable Signal

# PRODUCTION - Mode transition metrics and alerts
- alert: DegradedModeActive
  expr: >
    sum(rate(payment_mode_total{mode!="NORMAL"}[5m])) > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Payment service operating in {{ $labels.mode }} mode"

- alert: EmergencyModeActive
  expr: >
    sum(rate(payment_mode_total{mode="EMERGENCY"}[5m])) > 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Payment service in EMERGENCY mode"
    description: >
      Only recurring pre-authorized payments are being processed.
      All new payment requests are being rejected.

The mode transition metric is the single most important resilience signal. It answers: “What kind of service is the customer getting right now?” A dashboard showing the current operating mode, the time spent in each mode over the last 24 hours, and the mode transition history gives the operations team immediate situational awareness. They do not need to check five circuit breaker states and three error budget dashboards. The mode is the summary.