Adaptive Hedging and Request Cancellation

A fixed hedge delay assumes the dependency’s latency distribution is stable. Real services have latency that shifts with load, time of day, and deployment cycles. An adaptive hedge delay tracks the current latency percentile and adjusts the trigger point automatically.

Adaptive Hedge Delay

// PRODUCTION - Adaptive hedge delay based on rolling percentiles
public class AdaptiveHedgeDelay {

    private final DescriptiveStatistics latencyWindow;
    private final double targetPercentile;
    private final Duration minimumDelay;
    private final Duration maximumDelay;

    public AdaptiveHedgeDelay(int windowSize,
                               double targetPercentile,
                               Duration minimumDelay,
                               Duration maximumDelay) {
        this.latencyWindow = new DescriptiveStatistics(windowSize);
        this.targetPercentile = targetPercentile;
        this.minimumDelay = minimumDelay;
        this.maximumDelay = maximumDelay;
    }

    public synchronized void recordLatency(Duration latency) {
        latencyWindow.addValue(latency.toMillis());
    }

    public synchronized Duration currentDelay() {
        if (latencyWindow.getN() < 10) {
            return maximumDelay; // Not enough data, be conservative
        }

        double percentileMs = latencyWindow.getPercentile(
                targetPercentile * 100);
        Duration computed = Duration.ofMillis((long) percentileMs);

        if (computed.compareTo(minimumDelay) < 0) return minimumDelay;
        if (computed.compareTo(maximumDelay) > 0) return maximumDelay;
        return computed;
    }
}

The DescriptiveStatistics maintains a rolling window of the last N latency observations (Apache Commons Math). The hedge delay is set to the current p95 of the window. When the balance service has a garbage collection pause and its latencies spike, the window updates and the hedge delay widens automatically. When latencies return to normal, the delay tightens.

The minimum delay (5ms) prevents hedging every call during a flash of fast responses. The maximum delay (200ms) prevents the hedge delay from growing so large that it provides no latency improvement over a retry.

The Cancellation Problem

When the primary wins the race, the hedge request is in flight. The ideal behavior is to cancel the hedge request so the balance service does not process it. In practice, cancellation is unreliable at every layer:

Thread interruption. future.cancel(true) sets the thread’s interrupt flag. Most HTTP client implementations check the interrupt flag between I/O operations, but not during a blocking read. If the request has been sent and the client is waiting for the response, the interrupt may not take effect until after the response arrives.

Connection-level cancellation. HTTP/2 supports RST_STREAM to cancel a specific request. HTTP/1.1 does not have per-request cancellation; the connection carries one request at a time, and the only way to “cancel” is to close the connection. Closing the connection returns it to the pool in a broken state.

Server-side processing. Even if the client cancels the request, the server has already received it. The server processes the request, generates the response, and sends it. The response is discarded by the client (or the connection is closed). The server did the work regardless.

For the transaction platform, the balance check is cheap (database query, ~5ms server-side processing). The cost of processing a discarded hedge request is negligible. For expensive operations (report generation, complex aggregations), the server should accept a cancellation token (request header or query parameter) and periodically check whether the caller has cancelled.

Hedging with Resilience4J Patterns

Hedging interacts with the composed resilience stack. The correct placement:

Retry -> CircuitBreaker -> RateLimiter -> Bulkhead -> [Hedging -> TimeLimiter -> HTTP Call]

Hedging is inside the Bulkhead because each hedged call consumes a bulkhead permit. The primary and the hedge each need a permit. This means the effective concurrency for hedged calls is doubled at the bulkhead layer. If the bulkhead has 20 permits and 10 calls are hedged simultaneously, 20 permits are consumed (10 primaries + 10 hedges), leaving no permits for other calls.

Configure the bulkhead to account for hedge overhead:

resilience4j:
  bulkhead:
    instances:
      balanceCheck:
        max-concurrent-calls: 30 # 20 for normal calls + 10 for hedges
        # Assumes ~5% of calls trigger a hedge, and under peak load,
        # up to 10 hedges may be in flight simultaneously

The circuit breaker records both the primary and the hedge outcomes. If both fail, that is two failures recorded. If the primary fails but the hedge succeeds, the primary’s failure is still recorded. This is correct: the circuit breaker tracks the dependency’s health, and a slow or failing primary response is a signal even if the hedge covered for it.