Skip to main content
surviving the spike

CDN Behavior and the Vary Header Trap

11 min read Chapter 15 of 66

CDN Behavior and the Vary Header Trap

You configured perfect Cache-Control headers. You set s-maxage=60 on your fare estimates. You expected 85% of requests to hit the CDN cache. Your monitoring shows a 12% cache hit rate. Your origin is still burning.

The problem is not your headers. The problem is how the CDN interprets them, and one header in particular that can turn your cache into a graveyard: Vary.

CDN Cache Key Composition

A CDN decides whether to serve a cached response based on the cache key. Two requests with the same cache key get the same cached response. Two requests with different cache keys are treated as entirely separate resources.

The default cache key for most CDNs:

scheme + host + path + query string

So https://api.ridhail.com/v1/fare/estimate?originZone=airport&destZone=downtown and https://api.ridhail.com/v1/fare/estimate?originZone=downtown&destZone=airport are two different cache keys. This is correct. They return different fares.

But https://api.ridhail.com/v1/fare/estimate?originZone=airport&destZone=downtown and https://api.ridhail.com/v1/fare/estimate?destZone=downtown&originZone=airport are also two different cache keys on most CDNs. Same parameters, different order. Two cache entries for the same fare.

Query String Normalization

Cloudflare does not normalize query string order by default. CloudFront does not either. This means your cache hit rate is lower than it should be because clients send parameters in different orders.

// BOTTLENECK: Query parameter order creates duplicate cache entries
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
        @RequestParam String originZone,
        @RequestParam String destZone) {
    // Two different cache keys for same fare:
    // /estimate?originZone=airport&destZone=downtown
    // /estimate?destZone=downtown&originZone=airport
    return fareService.calculate(originZone, destZone)
        .map(estimate -> ResponseEntity.ok()
            .cacheControl(fareCacheControl())
            .body(estimate));
}

Two fixes. First, normalize at the CDN layer. Cloudflare offers query string sorting as a feature (Cache Rules > Sort Query String). Enable it. Second, normalize at the client. Enforce a canonical parameter order in your API client SDK.

// SCALED: Canonical URL redirect for cache key normalization
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
        @RequestParam String originZone,
        @RequestParam String destZone,
        ServerHttpRequest request) {

    // Normalize: ensure originZone < destZone alphabetically in URL
    String rawQuery = request.getURI().getRawQuery();
    String canonical = "destZone=" + destZone + "&originZone=" + originZone;

    if (rawQuery != null && !rawQuery.equals(canonical)
            && destZone.compareTo(originZone) < 0) {
        // Client SDK should enforce order, but redirect as fallback
        return Mono.just(ResponseEntity
            .status(HttpStatus.TEMPORARY_REDIRECT)
            .header("Location", "/api/v1/fare/estimate?" + canonical)
            .cacheControl(CacheControl.noStore())
            .build());
    }

    return fareService.calculate(originZone, destZone)
        .map(estimate -> ResponseEntity.ok()
            .eTag(computeETag(estimate))
            .cacheControl(fareCacheControl())
            .body(estimate));
}

The Vary Trap

The Vary response header extends the cache key. It tells caches: “The response depends on these request headers. Different values of these headers produce different responses.”

When the origin returns Vary: Accept-Encoding, the CDN adds the Accept-Encoding request header value to the cache key. There are only a few common values: gzip, br, gzip, br, identity. The CDN stores 2-4 variants per URL. Cache hit rates remain high.

When the origin returns Vary: Cookie, the CDN adds the entire Cookie header value to the cache key. Every user has a unique session cookie. Every user gets a unique cache entry. A URL that previously had one cache entry now has one per user. The cache hit rate approaches zero.

The Disaster Scenario

// BOTTLENECK: Vary: Cookie on a public endpoint
@GetMapping("/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
        @RequestParam String zoneId) {

    return zoneService.getAvailability(zoneId)
        .map(avail -> ResponseEntity.ok()
            .varyBy("Cookie")  // Catastrophic for cache hit rate
            .cacheControl(CacheControl.sMaxAge(Duration.ofSeconds(10)))
            .body(avail));
}

Zone availability is identical for every user. Adding Vary: Cookie creates a unique cache entry for every unique cookie value. With 50,000 active users, each zone endpoint has 50,000 cache entries instead of one. The CDN allocates memory for all of them, evicts aggressively, and almost never serves a cache hit.

Metrics before and after removing Vary: Cookie from the zone availability endpoint:

MetricWith Vary: CookieWithout Vary
CDN cache hit rate3%89%
Origin requests/sec1,380152
p95 latency285ms14ms
CDN cache entries per zone~48,0001
// SCALED: No Vary on public, user-independent data
@GetMapping("/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
        @RequestParam String zoneId) {

    return zoneService.getAvailability(zoneId)
        .map(avail -> ResponseEntity.ok()
            .cacheControl(CacheControl.sMaxAge(Duration.ofSeconds(10))
                .staleWhileRevalidate(Duration.ofSeconds(5)))
            .body(avail));
}

Vary Header Decision Matrix

Response depends onCorrect approachVary header?
Nothing (same for all users)s-maxage, no VaryNo
Accept-Encoding onlyLet CDN handle itVary: Accept-Encoding (safe)
User identityCache-Control: privateNo (browser only)
Accept-LanguageVary, if you support few languagesVary: Accept-Language (risky if many languages)
CookieNeverNo. Redesign the endpoint
AuthorizationNever at CDNNo. Use private
Custom header (X-Region)Only if few distinct valuesVary: X-Region (safe if 5-10 regions)

The rule: Vary is safe when the header has a small number of distinct values. It is destructive when the header has high cardinality (unique per user).

CDN Behavior Differences

CDNs do not all implement Cache-Control the same way. The differences can silently destroy your caching strategy.

Cloudflare

  • Respects s-maxage and max-age
  • Supports stale-while-revalidate: serves stale content and revalidates in the background
  • Does not cache responses with Set-Cookie by default (requires Cache Rules override)
  • Does not sort query strings by default (enable in Cache Rules)
  • Caches only specific content types by default. API responses (application/json) require explicit cache rules or Cache-Control: public, s-maxage=N
  • Free tier has no purge API rate limits but may delay propagation up to 30 seconds

CloudFront

  • Respects s-maxage and max-age
  • Does not support stale-while-revalidate as of 2025. The directive is ignored. CloudFront always fetches from origin when s-maxage expires
  • Requires explicit configuration to forward query strings (by default, CloudFront strips query strings from the cache key)
  • Supports cache policies for granular header/cookie forwarding
  • Purge (invalidation) is limited to 1,000 paths per invalidation request, and wildcard invalidations are expensive

The stale-while-revalidate Gap

This difference matters. On Cloudflare, when a cached fare estimate expires after 60 seconds:

  1. Next request arrives
  2. Cloudflare serves the stale cached version immediately (latency: ~8ms)
  3. Cloudflare sends a background request to the origin
  4. Origin responds with fresh data
  5. Cloudflare updates the cache

The user saw 8ms latency. The cache stayed warm.

On CloudFront, when the same cached fare estimate expires:

  1. Next request arrives
  2. CloudFront sends the request to the origin and waits
  3. Origin responds (latency: ~45ms including network)
  4. CloudFront caches the fresh response and serves it

The user saw 45ms latency. For 10 seconds of that 60-second window, every user hitting that edge location sees origin latency.

If you use CloudFront, compensate by increasing s-maxage or implementing origin-side caching (CH6 covers this with Redis).

Cache Purge Strategies

Cached content sometimes needs to be invalidated before it expires. Surge pricing activates, and riders must see the updated fare immediately, not in 60 seconds.

Strategy 1: Short TTL (Preferred)

Set s-maxage short enough that stale data is acceptable for the TTL duration. For surge pricing: s-maxage=10. The worst case staleness is 10 seconds. For most ride-hailing scenarios, this is acceptable.

Strategy 2: API Purge on Data Change

When surge activates in a zone, purge the CDN cache for that zone’s fare estimate endpoints.

// SCALED: Purge CDN cache when surge state changes
@Component
public class SurgeEventHandler {

    private final CdnPurgeClient cdnClient;

    @EventListener
    public void onSurgeActivated(SurgeActivatedEvent event) {
        String zoneId = event.getZoneId();

        // Purge all fare estimate URLs for this zone
        cdnClient.purgeByPrefix(
            "/api/v1/fare/estimate?originZone=" + zoneId
        ).subscribe();

        cdnClient.purgeByPrefix(
            "/api/v1/fare/estimate?destZone=" + zoneId
        ).subscribe();

        // Purge zone availability
        cdnClient.purgeByUrl(
            "/api/v1/zones/availability?zoneId=" + zoneId
        ).subscribe();
    }
}

Cloudflare supports purge-by-prefix and purge-by-tag. CloudFront supports path-based invalidation with wildcards. Both have API rate limits, so do not purge on every data change. Reserve purges for material changes like surge activation.

Strategy 3: Cache Tags (Cloudflare Enterprise)

Add a Cache-Tag response header with logical tags. Purge by tag when data changes.

// SCALED: Cache tags for targeted purge
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
        @RequestParam String originZone,
        @RequestParam String destZone) {

    return fareService.calculate(originZone, destZone)
        .map(estimate -> ResponseEntity.ok()
            .header("Cache-Tag",
                "fare",
                "zone-" + originZone,
                "zone-" + destZone)
            .cacheControl(fareCacheControl())
            .body(estimate));
}

// Purge all fares for a zone with a single API call:
// POST /purge {"tags": ["zone-downtown"]}

Measuring CDN Effectiveness with Locust

The only way to know your CDN is working is to measure it. Trust the metrics, not the configuration.

The Test Setup

Locust hits the CDN edge, not the origin directly. The origin logs every request it receives. The delta between Locust requests sent and origin requests received is the CDN’s contribution.

from locust import HttpUser, task, between, events
import time
import random

class CDNEffectivenessTest(HttpUser):
    """
    Target: CDN edge URL (e.g., https://api.ridehail.com)
    NOT the origin directly.
    """
    wait_time = between(0.2, 1)

    # Fixed set of zone pairs to maximize cache hits
    zone_pairs = [
        ("airport", "downtown"),
        ("downtown", "suburbs"),
        ("airport", "midtown"),
        ("suburbs", "airport"),
        ("midtown", "downtown"),
    ]

    @task(5)
    def fare_estimate(self):
        """High-frequency, highly cacheable endpoint."""
        origin, dest = random.choice(self.zone_pairs)
        with self.client.get(
            f"/api/v1/fare/estimate?originZone={origin}&destZone={dest}",
            name="/api/v1/fare/estimate",
            catch_response=True
        ) as response:
            # Track CDN cache status from response header
            cache_status = response.headers.get("CF-Cache-Status",
                          response.headers.get("X-Cache", "UNKNOWN"))
            response.success()
            # Log for analysis
            events.request.fire(
                request_type="CDN",
                name=f"cache_{cache_status}",
                response_time=response.elapsed.total_seconds() * 1000,
                response_length=len(response.content),
                exception=None,
                context={}
            )

    @task(3)
    def zone_availability(self):
        """Medium-frequency, short-TTL cacheable endpoint."""
        zone = random.choice(self.zone_pairs)[0]
        self.client.get(
            f"/api/v1/zones/availability?zoneId={zone}",
            name="/api/v1/zones/availability"
        )

    @task(1)
    def driver_location(self):
        """Low-frequency, uncacheable endpoint (no-store)."""
        driver_id = f"driver-{random.randint(1, 100)}"
        self.client.get(
            f"/api/v1/drivers/{driver_id}/location",
            name="/api/v1/drivers/[id]/location"
        )

Measuring Cache Hit Rate

Cloudflare returns CF-Cache-Status in the response header. CloudFront returns X-Cache. Parse these in your Locust test or in your monitoring.

CF-Cache-StatusMeaning
HITServed from CDN cache
MISSNot in cache, fetched from origin
EXPIREDWas in cache but expired, fetched from origin
STALEServed stale while revalidating (stale-while-revalidate)
DYNAMICNot eligible for caching (no s-maxage, or no-store)
BYPASSCDN bypassed due to configuration

Calculate cache hit rate: HIT / (HIT + MISS + EXPIRED + STALE) * 100

Include STALE as a hit because the user received a fast response.

Before/After Comparison

500 concurrent users, 5-minute test, CDN edge endpoint.

Before: Misconfigured Headers

Origin returns no Cache-Control on fare estimates. Returns Vary: Cookie on zone availability. Returns max-age=300 (no private) on trip history (data leak risk).

MetricValue
Total Locust requests sent187,400
Origin requests received174,200
CDN cache hit rate7%
Fare estimate p5048ms
Fare estimate p95320ms
Fare estimate p991,180ms
Zone availability p5041ms
Zone availability p95295ms
Origin CPU (4 pods)82%
CDN cache entries~52,000 (inflated by Vary: Cookie)

After: Correct Headers

Fare estimates: s-maxage=60, max-age=30, stale-while-revalidate=30. Zone availability: s-maxage=10, stale-while-revalidate=5, no Vary. Trip history: private, max-age=300. Driver location: no-store.

MetricValue
Total Locust requests sent187,400
Origin requests received28,100
CDN cache hit rate85%
Fare estimate p506ms
Fare estimate p9518ms
Fare estimate p9972ms
Zone availability p505ms
Zone availability p9514ms
Origin CPU (4 pods)14%
CDN cache entries47 (5 zone pairs + variants)

The origin went from processing 174,200 requests to 28,100. CPU dropped from 82% to 14%. The p95 for fare estimates dropped from 320ms to 18ms. CDN cache entries dropped from 52,000 (one per user due to Vary: Cookie) to 47 (one per unique URL with encoding variants).

Three of the four origin pods can be removed. Monthly savings: approximately $3,150 in compute costs. The change: correct HTTP headers and removing one Vary directive.

CDN Cache Decision Tree

For every endpoint in the ride-hailing platform, follow this decision process:

  1. Does the response change per user? Yes: Cache-Control: private, max-age=N. CDN does not cache. Stop.
  2. Does the response change in real time (< 5 seconds)? Yes: Cache-Control: no-store. Nothing caches. Stop.
  3. Does the response change frequently (5-60 seconds)? Yes: Cache-Control: s-maxage=N where N matches the update interval. Add stale-while-revalidate=N/2.
  4. Does the response change rarely (minutes to hours)? Yes: Cache-Control: s-maxage=N, max-age=M where M < N. Add ETag for conditional revalidation.
  5. Does the response never change (versioned by URL)? Yes: Cache-Control: public, max-age=31536000, immutable.
  6. Does the response depend on a request header? Only add Vary if that header has fewer than 10 distinct values. Otherwise, redesign the endpoint to move the variable into the URL path or query string.

Every endpoint in this platform was classified using this tree. The result: 85% of traffic served from CDN edge nodes, 14% origin CPU, and a p95 that dropped by 94%.