Hedging and Speculative Execution

Retry waits for failure before trying again. Hedging does not wait. It sends a second request while the first is still in flight, races them, and uses whichever response arrives first.

The Failure Mode

The balance service has a p99 latency of 50ms and a p99.9 of 800ms. Most calls are fast. A tiny fraction are slow. These outliers are not caused by dependency failures; they are caused by garbage collection pauses, network path variance, kernel scheduling, or lock contention inside the balance service. The balance service is healthy. It is just occasionally slow for a single request.

Retry does not help here. A retry waits for the timeout (500ms), then sends a second request. Total time: 500ms + second request time. With hedging, the second request is sent at the p95 latency mark (say, 30ms). If the first request was an outlier, the second request almost certainly completes in normal time. Total time: ~50ms instead of 500ms+.

Hedging trades increased load for reduced tail latency. A 1% hedge rate (sending a second request for 1% of calls) roughly doubles load for 1% of calls but can reduce p99 latency by 10x.

The Cost Model

The decision to hedge depends on three factors:

Idempotency. The hedged operation must be safe to execute twice. Reading the account balance: safe. Deducting from the account balance: not safe without idempotency keys.

Load headroom. If the balance service is at 90% capacity, hedging adds 1-5% more load. At 50% capacity, the additional load is negligible. Hedging a service near capacity pushes it closer to saturation, increasing the very latency you are trying to reduce.

Tail latency distribution. If p50 is 20ms and p99 is 50ms, the tail is tight. Hedging adds load with minimal latency improvement. If p50 is 20ms and p99 is 800ms, the tail is wide. Hedging provides substantial improvement. The wider the gap between median and tail, the more effective hedging becomes.

From Scratch: The Racing Hedge

// SCRATCH - Speculative execution with CompletableFuture racing
public class HedgingExecutor<T> {

    private final Duration hedgeDelay;
    private final ScheduledExecutorService scheduler;
    private final LongAdder hedgedCallsTotal = new LongAdder();
    private final LongAdder hedgeWinsTotal = new LongAdder();

    public HedgingExecutor(Duration hedgeDelay,
                           ScheduledExecutorService scheduler) {
        this.hedgeDelay = hedgeDelay;
        this.scheduler = scheduler;
    }

    /**
     * Execute the supplier, and if it has not completed within hedgeDelay,
     * send a second speculative request. Return whichever completes first.
     */
    public CompletableFuture<T> execute(Supplier<T> primaryCall,
                                         Supplier<T> hedgeCall,
                                         ExecutorService executor) {
        CompletableFuture<T> primary = CompletableFuture.supplyAsync(
                primaryCall, executor);

        CompletableFuture<T> result = new CompletableFuture<>();

        // If primary completes before hedge delay, use it directly
        primary.whenComplete((value, error) -> {
            if (error == null) {
                result.complete(value);
            }
            // If primary fails, we still wait for the hedge
        });

        // Schedule hedge after delay
        scheduler.schedule(() -> {
            if (result.isDone()) {
                return; // Primary already completed, no need to hedge
            }

            hedgedCallsTotal.increment();
            CompletableFuture<T> hedge = CompletableFuture.supplyAsync(
                    hedgeCall, executor);

            hedge.whenComplete((value, error) -> {
                if (error == null && result.complete(value)) {
                    hedgeWinsTotal.increment();
                    // Hedge won. Primary is still in flight but its result
                    // will be ignored (the CompletableFuture is already complete).
                }
            });

            // If primary fails after hedge was sent, let hedge be the fallback
            primary.whenComplete((value, error) -> {
                if (error != null && !result.isDone()) {
                    // Primary failed, hedge is our only hope
                    hedge.whenComplete((hv, he) -> {
                        if (he != null) {
                            result.completeExceptionally(he);
                        }
                    });
                }
            });
        }, hedgeDelay.toMillis(), TimeUnit.MILLISECONDS);

        return result;
    }

    public double hedgeWinRate() {
        long hedged = hedgedCallsTotal.sum();
        if (hedged == 0) return 0.0;
        return (double) hedgeWinsTotal.sum() / hedged;
    }
}

Three things this implementation reveals:

The losing request keeps running. When the primary wins, the hedge request is still in flight on the balance service. The response is discarded by the payment service, but the balance service still processes it. The inverse is also true: when the hedge wins, the primary keeps running. Both requests complete; only one result is used.

The hedge delay is the tuning knob. Too short (1ms): you hedge almost every call, doubling load. Too long (at the timeout): you are doing retry, not hedging. The correct value is between p90 and p99 of the dependency’s latency. At p95 (30ms), you hedge only the 5% slowest calls.

Cancellation is unreliable. Calling primary.cancel(true) sets the interrupt flag on the thread, but the HTTP client may not check it. The request has already been sent over the wire. Cancellation is a thread-level concern; the network request proceeds regardless.

The Production Implementation

For the transaction platform, hedging applies to the balance check call. The balance check is a read operation (idempotent) with a wide tail latency:

// PRODUCTION - Hedged balance check
@Service
public class BalanceService {

    private final RestClient balanceClient;
    private final ScheduledExecutorService hedgeScheduler;
    private final MeterRegistry meterRegistry;
    private static final Duration HEDGE_DELAY = Duration.ofMillis(30);

    public BalanceService(RestClient balanceClient,
                          MeterRegistry meterRegistry) {
        this.balanceClient = balanceClient;
        this.meterRegistry = meterRegistry;
        this.hedgeScheduler = Executors.newSingleThreadScheduledExecutor(
                r -> {
                    Thread t = new Thread(r, "balance-hedge-scheduler");
                    t.setDaemon(true);
                    return t;
                });
    }

    public BalanceResponse checkBalance(String accountId) {
        HedgingExecutor<BalanceResponse> hedger = new HedgingExecutor<>(
                HEDGE_DELAY, hedgeScheduler);

        Supplier<BalanceResponse> call = () -> balanceClient.get()
                .uri("/accounts/{id}/balance", accountId)
                .retrieve()
                .body(BalanceResponse.class);

        try {
            BalanceResponse response = hedger.execute(
                    call, call, // Same call for both primary and hedge
                    ForkJoinPool.commonPool()
            ).get(2, TimeUnit.SECONDS);

            meterRegistry.counter("balance.hedge.win.rate")
                    .increment(hedger.hedgeWinRate());

            return response;
        } catch (TimeoutException e) {
            throw new BalanceCheckTimeoutException(accountId, e);
        } catch (Exception e) {
            throw new BalanceCheckException(accountId, e);
        }
    }
}

Both the primary and the hedge send the same request to the same service. The balance service processes the request identically regardless of whether it is a primary or a hedge. No special handling is needed on the server side because the operation is a read.

The Metric That Justifies Hedging

// PRODUCTION - Hedge effectiveness tracking
@Component
public class HedgeMetrics {

    private final MeterRegistry registry;

    public void recordHedgeOutcome(String dependency,
                                    boolean hedgeSent,
                                    boolean hedgeWon,
                                    Duration primaryLatency,
                                    Duration hedgeLatency) {
        if (hedgeSent) {
            registry.counter("hedge.sent", "dependency", dependency)
                    .increment();

            if (hedgeWon) {
                registry.counter("hedge.won", "dependency", dependency)
                        .increment();

                Duration saved = primaryLatency.minus(hedgeLatency);
                registry.timer("hedge.latency.saved",
                        "dependency", dependency)
                        .record(saved);
            }
        }
    }
}

If hedge.won / hedge.sent is consistently above 50%, the hedge delay is too short (hedging too aggressively). If it is consistently below 5%, the tail latency is tight and hedging provides little benefit. The sweet spot is 10-30%: the hedge fires for genuinely slow calls and wins most of the time.

When Not to Hedge

Write operations without idempotency keys. Deducting a balance twice is not acceptable.
Dependencies near capacity. Adding 5% more load to a saturated service causes more tail latency, not less.
Calls with side effects. Sending a notification twice is annoying. Posting a ledger entry twice is a compliance violation.
Short, tight tail latency distributions. If p99 is only 2x the median, the cost of hedging (doubled load for slow calls) outweighs the benefit (marginal latency reduction).

The balance check in the transaction platform is a strong candidate for hedging: it is a read operation, the balance service has significant capacity headroom, and the tail latency is 16x the median. The payment gateway is not a candidate: it performs a mutation (charging the card) and the provider charges per API call.