Skip to main content
surviving the spike

Caching Layer One: HTTP Cache Controls and CDN Behavior

10 min read Chapter 13 of 66

Caching Layer One: HTTP Cache Controls and CDN Behavior

Your servers are burning CPU to render the same fare estimate 14,000 times per minute. The fare for the airport-to-downtown corridor during non-surge hours changes once every 60 seconds. Every one of those 14,000 requests hits the fare calculation service, runs the distance matrix lookup, applies the pricing model, and returns the same JSON payload. This is not a scaling problem. This is a caching problem, and the HTTP specification solved it in 1999.

HTTP caching is the first layer of defense between your users and your origin servers. Before Redis, before application-level memoization, before any code change at all, the correct Cache-Control headers can eliminate 80% or more of your origin traffic. The CDN does the work. Your origin does not even see the requests.

This chapter covers the full HTTP caching stack: the headers that control it, the conditional requests that validate it, and the CDN behavior that makes or breaks it. Every example targets the ride-hailing platform. Every metric comes from a Locust test.

Three-layer cache architecture showing the cache-aside pattern: requests check L1 Caffeine (~1ms), then L2 Redis (~5ms), then L3 PostgreSQL (~50ms), with hit/miss decision points and cache population on misses

This diagram illustrates the cache-aside pattern used throughout the platform. A request first checks the L1 in-process Caffeine cache (sub-millisecond). On a miss, it falls through to L2 Redis (shared across pods, ~5ms). Only on a double miss does the request reach the L3 PostgreSQL database (~50ms). When a lower layer returns data, it populates all the layers above it on the way back. With typical hit rates of 60-80% at L1, the average response latency drops to ~4ms, and the database handles only 5-10% of total read traffic.

The Symptom

The fare estimation endpoint handles 14,000 requests per minute during peak. CPU on the fare service pods sits at 78%. Response times at p95 are 340ms. The team is planning to scale horizontally, adding four more pods. The monthly infrastructure bill is about to increase by $4,200.

The fare for a given origin-destination pair during non-surge periods changes at most once per minute. During surge, the surge multiplier updates every 10 seconds. Even in the worst case, the same fare is recomputed hundreds of times within its validity window.

The Cause

Every request bypasses all caching layers and hits the origin. The API returns responses with no Cache-Control headers. The CDN (Cloudflare, in this case) treats every response as uncacheable and forwards every request to the origin. The browser makes a fresh request every time the user opens the fare screen.

// BOTTLENECK: No cache headers, every request hits origin
@GetMapping("/api/v1/fare/estimate")
public Mono<FareEstimate> estimateFare(
        @RequestParam String originZone,
        @RequestParam String destZone) {
    return fareService.calculate(originZone, destZone);
}

The response headers look like this:

HTTP/1.1 200 OK
Content-Type: application/json
// No Cache-Control
// No ETag
// No Last-Modified

The CDN sees no caching instructions. It forwards everything. The origin processes everything.

Cache-Control Directives

Cache-Control is the primary mechanism for controlling HTTP caching. Each directive serves a specific purpose. Misunderstanding any one of them leads to either over-caching (serving stale data) or under-caching (wasting origin resources).

The Directives That Matter

DirectiveWhere it appliesWhat it does
max-age=NBrowser + CDNResponse is fresh for N seconds from the time of the request
s-maxage=NCDN onlyOverrides max-age for shared caches (CDNs, proxies). Browser ignores this
no-cacheBrowser + CDNMust revalidate with the origin before using cached copy. Does NOT mean “don’t cache”
no-storeBrowser + CDNDo not cache at all. Not in browser, not in CDN, not on disk
privateBrowser onlyOnly the browser may cache. CDN must not cache
publicBrowser + CDNExplicitly cacheable by any cache, including CDNs
immutableBrowserContent will never change. Browser should not revalidate even on reload
stale-while-revalidate=NCDN (varies)Serve stale content for N seconds while revalidating in the background

Ride-Hailing Endpoint Classification

Not every endpoint is cacheable. Real-time state must never be cached. Semi-static data can be cached aggressively.

EndpointCacheabilityHeadersReason
Fare estimate (non-surge)CDN + Browsers-maxage=60, max-age=30, stale-while-revalidate=30Changes at most once per minute
Fare estimate (surge active)CDN only, shorts-maxage=10, no-cacheSurge multiplier changes every 10s
Driver availability zonesCDN only, shorts-maxage=10, stale-while-revalidate=5Zone aggregates update every 10s
Trip historyBrowser onlyprivate, max-age=300User-specific, sensitive
Real-time driver locationNot cacheableno-storeChanges every second
Active trip statusNot cacheableno-storeReal-time state
Surge pricing map tileCDN + Browsers-maxage=15, max-age=10Tile-based, updates on surge recalculation

ETag and Last-Modified: Conditional Requests

Cache-Control tells caches how long content is fresh. ETags and Last-Modified tell caches how to check if content has changed once it goes stale.

The Conditional Request Flow

  1. Origin sends response with ETag: "a1b2c3" and Cache-Control: s-maxage=60
  2. CDN caches the response
  3. After 60 seconds, CDN receives a new request for the same resource
  4. CDN sends the request to the origin with If-None-Match: "a1b2c3"
  5. Origin checks: has the fare changed? If not, it returns 304 Not Modified with no body
  6. CDN serves the cached copy for another s-maxage period

The 304 response is tiny. No fare calculation. No JSON serialization. No database query. The origin confirms “nothing changed” and the CDN does the rest.

ETag Generation

// SCALED: ETag from content hash
@GetMapping("/api/v1/fare/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
        @RequestParam String originZone,
        @RequestParam String destZone,
        ServerHttpRequest request) {
    return fareService.calculate(originZone, destZone)
        .map(estimate -> {
            String etag = generateETag(estimate);

            // Check if client already has current version
            if (request.getHeaders().getIfNoneMatch().contains(etag)) {
                return ResponseEntity.status(HttpStatus.NOT_MODIFIED)
                    .eTag(etag)
                    .build();
            }

            return ResponseEntity.ok()
                .eTag(etag)
                .cacheControl(CacheControl.maxAge(Duration.ofSeconds(30))
                    .sMaxAge(Duration.ofSeconds(60))
                    .staleWhileRevalidate(Duration.ofSeconds(30)))
                .body(estimate);
        });
}

private String generateETag(FareEstimate estimate) {
    String content = estimate.originZone() + ":"
        + estimate.destZone() + ":"
        + estimate.totalCents() + ":"
        + estimate.surgeMultiplier();
    return "\"" + Integer.toHexString(content.hashCode()) + "\"";
}

The Vary Header

The Vary header tells caches: “This response depends on these request headers. Different values of these headers produce different responses. Cache them separately.”

Vary: Accept-Encoding is safe. There are only two or three common encodings (gzip, br, identity). The CDN stores two or three variants. Hit rates remain high.

Vary: Cookie is a disaster. Every user has a different cookie. The CDN creates a separate cache entry for every unique cookie value. Cache hit rate drops to near zero. The CDN becomes a very expensive proxy.

Vary: Authorization is the same disaster. Every user has a unique token. If your response varies by user, use Cache-Control: private instead and let only the browser cache.

// BOTTLENECK: Vary: Authorization on a public endpoint
@GetMapping("/api/v1/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
        @RequestParam String zoneId) {
    return zoneService.getAvailability(zoneId)
        .map(avail -> ResponseEntity.ok()
            .varyBy("Authorization")  // Every user gets a separate cache entry
            .cacheControl(CacheControl.maxAge(Duration.ofSeconds(10)))
            .body(avail));
}

Zone availability is the same for every user. The Vary: Authorization header turns a single cacheable response into millions of cache entries, one per user token. The fix: remove Vary: Authorization entirely.

// SCALED: No Vary on public data, correct s-maxage
@GetMapping("/api/v1/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
        @RequestParam String zoneId) {
    return zoneService.getAvailability(zoneId)
        .map(avail -> ResponseEntity.ok()
            .cacheControl(CacheControl.maxAge(Duration.ofSeconds(5))
                .sMaxAge(Duration.ofSeconds(10))
                .staleWhileRevalidate(Duration.ofSeconds(5)))
            .body(avail));
}

Spring WebFlux CacheControl Builder

Spring provides CacheControl as a fluent builder. Use it. Do not construct Cache-Control header strings manually.

// Common patterns for the ride-hailing platform

// Fare estimates: CDN caches 60s, browser 30s, serve stale for 30s during revalidation
CacheControl fareEstimate = CacheControl
    .maxAge(Duration.ofSeconds(30))
    .sMaxAge(Duration.ofSeconds(60))
    .staleWhileRevalidate(Duration.ofSeconds(30));

// Trip history: browser only, 5 minutes
CacheControl tripHistory = CacheControl
    .maxAge(Duration.ofSeconds(300))
    .cachePrivate();

// Real-time driver location: never cache
CacheControl driverLocation = CacheControl.noStore();

// Static zone boundaries: long cache, immutable content hash in URL
CacheControl zoneBoundaries = CacheControl
    .maxAge(Duration.ofDays(365))
    .cachePublic()
    .immutable();

The Baseline: Locust Test

The Locust test simulates peak traffic: riders requesting fare estimates, checking driver availability, and viewing trip history. First, against the origin with no caching. Then, with CDN caching enabled via correct headers.

from locust import HttpUser, task, between, events
import time

class RideHailingUser(HttpUser):
    wait_time = between(0.5, 2)

    zones = [
        ("airport", "downtown"),
        ("downtown", "suburbs"),
        ("airport", "midtown"),
        ("suburbs", "airport"),
        ("midtown", "downtown"),
    ]

    @task(5)
    def fare_estimate(self):
        origin, dest = self.zones[int(time.time()) % len(self.zones)]
        self.client.get(
            f"/api/v1/fare/estimate?originZone={origin}&destZone={dest}",
            name="/api/v1/fare/estimate"
        )

    @task(3)
    def zone_availability(self):
        zone = self.zones[int(time.time()) % len(self.zones)][0]
        self.client.get(
            f"/api/v1/zones/availability?zoneId={zone}",
            name="/api/v1/zones/availability"
        )

    @task(1)
    def trip_history(self):
        self.client.get(
            "/api/v1/trips/history?limit=10",
            name="/api/v1/trips/history",
            headers={"Authorization": f"Bearer user-token-{self.environment.runner.user_count}"}
        )

The Fix

Apply correct Cache-Control headers to every endpoint. The CDN absorbs the cacheable traffic. The origin only handles cache misses and personalized requests.

// SCALED: Complete fare controller with caching
@RestController
@RequestMapping("/api/v1/fare")
public class FareController {

    private final FareService fareService;
    private final SurgeService surgeService;

    @GetMapping("/estimate")
    public Mono<ResponseEntity<FareEstimate>> estimateFare(
            @RequestParam String originZone,
            @RequestParam String destZone,
            ServerHttpRequest request) {

        return surgeService.isActive(originZone)
            .flatMap(surgeActive -> {
                CacheControl cacheControl = surgeActive
                    ? CacheControl.noCache()
                        .sMaxAge(Duration.ofSeconds(10))
                    : CacheControl.maxAge(Duration.ofSeconds(30))
                        .sMaxAge(Duration.ofSeconds(60))
                        .staleWhileRevalidate(Duration.ofSeconds(30));

                return fareService.calculate(originZone, destZone)
                    .map(estimate -> {
                        String etag = generateETag(estimate);

                        if (request.getHeaders()
                                .getIfNoneMatch().contains(etag)) {
                            return ResponseEntity
                                .status(HttpStatus.NOT_MODIFIED)
                                .eTag(etag)
                                .cacheControl(cacheControl)
                                .<FareEstimate>build();
                        }

                        return ResponseEntity.ok()
                            .eTag(etag)
                            .cacheControl(cacheControl)
                            .body(estimate);
                    });
            });
    }
}

The Proof

Locust results with 500 concurrent users over 5 minutes. Origin hit directly (no CDN) vs. origin behind Cloudflare with correct Cache-Control headers.

Before: No Cache Headers (Origin Direct)

MetricFare EstimateZone AvailabilityTrip History
Requests/sec2,3401,404468
p50 latency42ms38ms65ms
p95 latency340ms290ms410ms
p99 latency1,200ms980ms1,450ms
Origin CPU78%
Origin requests/sec4,212 (total)

After: Correct Cache-Control Headers + CDN

MetricFare EstimateZone AvailabilityTrip History
Requests/sec2,3401,404468
p50 latency8ms6ms62ms
p95 latency22ms18ms390ms
p99 latency85ms64ms1,380ms
CDN cache hit rate87%82%0% (private)
Origin CPU18%
Origin requests/sec648 (total)

The CDN absorbed 85% of the total traffic. Origin CPU dropped from 78% to 18%. The p95 for fare estimates dropped from 340ms to 22ms because CDN edge nodes are 5ms from the client, not 45ms.

Trip history shows no improvement because it is marked private. The CDN correctly does not cache it. This is the expected behavior.

The four pods the team planned to add are no longer needed. The infrastructure cost increase of $4,200/month is replaced by correct HTTP headers that cost nothing.

What This Chapter Does Not Cover

Application-level caching with Redis is covered in CH6. Frontend static asset caching is covered in CH11. This chapter addresses only the HTTP caching layer between clients, CDNs, and the origin.

The next two sections dig into the specifics: CH5-S1 covers the header mechanics and Spring WebFlux implementation in detail. CH5-S2 covers CDN behavior, the Vary header trap, and how to measure CDN effectiveness.