CDN Behavior and the Vary Header Trap
CDN Behavior and the Vary Header Trap
You configured perfect Cache-Control headers. You set s-maxage=60 on your fare estimates. You expected 85% of requests to hit the CDN cache. Your monitoring shows a 12% cache hit rate. Your origin is still burning.
The problem is not your headers. The problem is how the CDN interprets them, and one header in particular that can turn your cache into a graveyard: Vary.
CDN Cache Key Composition
A CDN decides whether to serve a cached response based on the cache key. Two requests with the same cache key get the same cached response. Two requests with different cache keys are treated as entirely separate resources.
The default cache key for most CDNs:
scheme + host + path + query string
So https://api.ridhail.com/v1/fare/estimate?originZone=airport&destZone=downtown and https://api.ridhail.com/v1/fare/estimate?originZone=downtown&destZone=airport are two different cache keys. This is correct. They return different fares.
But https://api.ridhail.com/v1/fare/estimate?originZone=airport&destZone=downtown and https://api.ridhail.com/v1/fare/estimate?destZone=downtown&originZone=airport are also two different cache keys on most CDNs. Same parameters, different order. Two cache entries for the same fare.
Query String Normalization
Cloudflare does not normalize query string order by default. CloudFront does not either. This means your cache hit rate is lower than it should be because clients send parameters in different orders.
// BOTTLENECK: Query parameter order creates duplicate cache entries
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone) {
// Two different cache keys for same fare:
// /estimate?originZone=airport&destZone=downtown
// /estimate?destZone=downtown&originZone=airport
return fareService.calculate(originZone, destZone)
.map(estimate -> ResponseEntity.ok()
.cacheControl(fareCacheControl())
.body(estimate));
}
Two fixes. First, normalize at the CDN layer. Cloudflare offers query string sorting as a feature (Cache Rules > Sort Query String). Enable it. Second, normalize at the client. Enforce a canonical parameter order in your API client SDK.
// SCALED: Canonical URL redirect for cache key normalization
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone,
ServerHttpRequest request) {
// Normalize: ensure originZone < destZone alphabetically in URL
String rawQuery = request.getURI().getRawQuery();
String canonical = "destZone=" + destZone + "&originZone=" + originZone;
if (rawQuery != null && !rawQuery.equals(canonical)
&& destZone.compareTo(originZone) < 0) {
// Client SDK should enforce order, but redirect as fallback
return Mono.just(ResponseEntity
.status(HttpStatus.TEMPORARY_REDIRECT)
.header("Location", "/api/v1/fare/estimate?" + canonical)
.cacheControl(CacheControl.noStore())
.build());
}
return fareService.calculate(originZone, destZone)
.map(estimate -> ResponseEntity.ok()
.eTag(computeETag(estimate))
.cacheControl(fareCacheControl())
.body(estimate));
}
The Vary Trap
The Vary response header extends the cache key. It tells caches: “The response depends on these request headers. Different values of these headers produce different responses.”
When the origin returns Vary: Accept-Encoding, the CDN adds the Accept-Encoding request header value to the cache key. There are only a few common values: gzip, br, gzip, br, identity. The CDN stores 2-4 variants per URL. Cache hit rates remain high.
When the origin returns Vary: Cookie, the CDN adds the entire Cookie header value to the cache key. Every user has a unique session cookie. Every user gets a unique cache entry. A URL that previously had one cache entry now has one per user. The cache hit rate approaches zero.
The Disaster Scenario
// BOTTLENECK: Vary: Cookie on a public endpoint
@GetMapping("/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
@RequestParam String zoneId) {
return zoneService.getAvailability(zoneId)
.map(avail -> ResponseEntity.ok()
.varyBy("Cookie") // Catastrophic for cache hit rate
.cacheControl(CacheControl.sMaxAge(Duration.ofSeconds(10)))
.body(avail));
}
Zone availability is identical for every user. Adding Vary: Cookie creates a unique cache entry for every unique cookie value. With 50,000 active users, each zone endpoint has 50,000 cache entries instead of one. The CDN allocates memory for all of them, evicts aggressively, and almost never serves a cache hit.
Metrics before and after removing Vary: Cookie from the zone availability endpoint:
| Metric | With Vary: Cookie | Without Vary |
|---|---|---|
| CDN cache hit rate | 3% | 89% |
| Origin requests/sec | 1,380 | 152 |
| p95 latency | 285ms | 14ms |
| CDN cache entries per zone | ~48,000 | 1 |
// SCALED: No Vary on public, user-independent data
@GetMapping("/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
@RequestParam String zoneId) {
return zoneService.getAvailability(zoneId)
.map(avail -> ResponseEntity.ok()
.cacheControl(CacheControl.sMaxAge(Duration.ofSeconds(10))
.staleWhileRevalidate(Duration.ofSeconds(5)))
.body(avail));
}
Vary Header Decision Matrix
| Response depends on | Correct approach | Vary header? |
|---|---|---|
| Nothing (same for all users) | s-maxage, no Vary | No |
| Accept-Encoding only | Let CDN handle it | Vary: Accept-Encoding (safe) |
| User identity | Cache-Control: private | No (browser only) |
| Accept-Language | Vary, if you support few languages | Vary: Accept-Language (risky if many languages) |
| Cookie | Never | No. Redesign the endpoint |
| Authorization | Never at CDN | No. Use private |
| Custom header (X-Region) | Only if few distinct values | Vary: X-Region (safe if 5-10 regions) |
The rule: Vary is safe when the header has a small number of distinct values. It is destructive when the header has high cardinality (unique per user).
CDN Behavior Differences
CDNs do not all implement Cache-Control the same way. The differences can silently destroy your caching strategy.
Cloudflare
- Respects
s-maxageandmax-age - Supports
stale-while-revalidate: serves stale content and revalidates in the background - Does not cache responses with
Set-Cookieby default (requires Cache Rules override) - Does not sort query strings by default (enable in Cache Rules)
- Caches only specific content types by default. API responses (
application/json) require explicit cache rules orCache-Control: public, s-maxage=N - Free tier has no purge API rate limits but may delay propagation up to 30 seconds
CloudFront
- Respects
s-maxageandmax-age - Does not support
stale-while-revalidateas of 2025. The directive is ignored. CloudFront always fetches from origin whens-maxageexpires - Requires explicit configuration to forward query strings (by default, CloudFront strips query strings from the cache key)
- Supports cache policies for granular header/cookie forwarding
- Purge (invalidation) is limited to 1,000 paths per invalidation request, and wildcard invalidations are expensive
The stale-while-revalidate Gap
This difference matters. On Cloudflare, when a cached fare estimate expires after 60 seconds:
- Next request arrives
- Cloudflare serves the stale cached version immediately (latency: ~8ms)
- Cloudflare sends a background request to the origin
- Origin responds with fresh data
- Cloudflare updates the cache
The user saw 8ms latency. The cache stayed warm.
On CloudFront, when the same cached fare estimate expires:
- Next request arrives
- CloudFront sends the request to the origin and waits
- Origin responds (latency: ~45ms including network)
- CloudFront caches the fresh response and serves it
The user saw 45ms latency. For 10 seconds of that 60-second window, every user hitting that edge location sees origin latency.
If you use CloudFront, compensate by increasing s-maxage or implementing origin-side caching (CH6 covers this with Redis).
Cache Purge Strategies
Cached content sometimes needs to be invalidated before it expires. Surge pricing activates, and riders must see the updated fare immediately, not in 60 seconds.
Strategy 1: Short TTL (Preferred)
Set s-maxage short enough that stale data is acceptable for the TTL duration. For surge pricing: s-maxage=10. The worst case staleness is 10 seconds. For most ride-hailing scenarios, this is acceptable.
Strategy 2: API Purge on Data Change
When surge activates in a zone, purge the CDN cache for that zone’s fare estimate endpoints.
// SCALED: Purge CDN cache when surge state changes
@Component
public class SurgeEventHandler {
private final CdnPurgeClient cdnClient;
@EventListener
public void onSurgeActivated(SurgeActivatedEvent event) {
String zoneId = event.getZoneId();
// Purge all fare estimate URLs for this zone
cdnClient.purgeByPrefix(
"/api/v1/fare/estimate?originZone=" + zoneId
).subscribe();
cdnClient.purgeByPrefix(
"/api/v1/fare/estimate?destZone=" + zoneId
).subscribe();
// Purge zone availability
cdnClient.purgeByUrl(
"/api/v1/zones/availability?zoneId=" + zoneId
).subscribe();
}
}
Cloudflare supports purge-by-prefix and purge-by-tag. CloudFront supports path-based invalidation with wildcards. Both have API rate limits, so do not purge on every data change. Reserve purges for material changes like surge activation.
Strategy 3: Cache Tags (Cloudflare Enterprise)
Add a Cache-Tag response header with logical tags. Purge by tag when data changes.
// SCALED: Cache tags for targeted purge
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone) {
return fareService.calculate(originZone, destZone)
.map(estimate -> ResponseEntity.ok()
.header("Cache-Tag",
"fare",
"zone-" + originZone,
"zone-" + destZone)
.cacheControl(fareCacheControl())
.body(estimate));
}
// Purge all fares for a zone with a single API call:
// POST /purge {"tags": ["zone-downtown"]}
Measuring CDN Effectiveness with Locust
The only way to know your CDN is working is to measure it. Trust the metrics, not the configuration.
The Test Setup
Locust hits the CDN edge, not the origin directly. The origin logs every request it receives. The delta between Locust requests sent and origin requests received is the CDN’s contribution.
from locust import HttpUser, task, between, events
import time
import random
class CDNEffectivenessTest(HttpUser):
"""
Target: CDN edge URL (e.g., https://api.ridehail.com)
NOT the origin directly.
"""
wait_time = between(0.2, 1)
# Fixed set of zone pairs to maximize cache hits
zone_pairs = [
("airport", "downtown"),
("downtown", "suburbs"),
("airport", "midtown"),
("suburbs", "airport"),
("midtown", "downtown"),
]
@task(5)
def fare_estimate(self):
"""High-frequency, highly cacheable endpoint."""
origin, dest = random.choice(self.zone_pairs)
with self.client.get(
f"/api/v1/fare/estimate?originZone={origin}&destZone={dest}",
name="/api/v1/fare/estimate",
catch_response=True
) as response:
# Track CDN cache status from response header
cache_status = response.headers.get("CF-Cache-Status",
response.headers.get("X-Cache", "UNKNOWN"))
response.success()
# Log for analysis
events.request.fire(
request_type="CDN",
name=f"cache_{cache_status}",
response_time=response.elapsed.total_seconds() * 1000,
response_length=len(response.content),
exception=None,
context={}
)
@task(3)
def zone_availability(self):
"""Medium-frequency, short-TTL cacheable endpoint."""
zone = random.choice(self.zone_pairs)[0]
self.client.get(
f"/api/v1/zones/availability?zoneId={zone}",
name="/api/v1/zones/availability"
)
@task(1)
def driver_location(self):
"""Low-frequency, uncacheable endpoint (no-store)."""
driver_id = f"driver-{random.randint(1, 100)}"
self.client.get(
f"/api/v1/drivers/{driver_id}/location",
name="/api/v1/drivers/[id]/location"
)
Measuring Cache Hit Rate
Cloudflare returns CF-Cache-Status in the response header. CloudFront returns X-Cache. Parse these in your Locust test or in your monitoring.
| CF-Cache-Status | Meaning |
|---|---|
| HIT | Served from CDN cache |
| MISS | Not in cache, fetched from origin |
| EXPIRED | Was in cache but expired, fetched from origin |
| STALE | Served stale while revalidating (stale-while-revalidate) |
| DYNAMIC | Not eligible for caching (no s-maxage, or no-store) |
| BYPASS | CDN bypassed due to configuration |
Calculate cache hit rate: HIT / (HIT + MISS + EXPIRED + STALE) * 100
Include STALE as a hit because the user received a fast response.
Before/After Comparison
500 concurrent users, 5-minute test, CDN edge endpoint.
Before: Misconfigured Headers
Origin returns no Cache-Control on fare estimates. Returns Vary: Cookie on zone availability. Returns max-age=300 (no private) on trip history (data leak risk).
| Metric | Value |
|---|---|
| Total Locust requests sent | 187,400 |
| Origin requests received | 174,200 |
| CDN cache hit rate | 7% |
| Fare estimate p50 | 48ms |
| Fare estimate p95 | 320ms |
| Fare estimate p99 | 1,180ms |
| Zone availability p50 | 41ms |
| Zone availability p95 | 295ms |
| Origin CPU (4 pods) | 82% |
| CDN cache entries | ~52,000 (inflated by Vary: Cookie) |
After: Correct Headers
Fare estimates: s-maxage=60, max-age=30, stale-while-revalidate=30. Zone availability: s-maxage=10, stale-while-revalidate=5, no Vary. Trip history: private, max-age=300. Driver location: no-store.
| Metric | Value |
|---|---|
| Total Locust requests sent | 187,400 |
| Origin requests received | 28,100 |
| CDN cache hit rate | 85% |
| Fare estimate p50 | 6ms |
| Fare estimate p95 | 18ms |
| Fare estimate p99 | 72ms |
| Zone availability p50 | 5ms |
| Zone availability p95 | 14ms |
| Origin CPU (4 pods) | 14% |
| CDN cache entries | 47 (5 zone pairs + variants) |
The origin went from processing 174,200 requests to 28,100. CPU dropped from 82% to 14%. The p95 for fare estimates dropped from 320ms to 18ms. CDN cache entries dropped from 52,000 (one per user due to Vary: Cookie) to 47 (one per unique URL with encoding variants).
Three of the four origin pods can be removed. Monthly savings: approximately $3,150 in compute costs. The change: correct HTTP headers and removing one Vary directive.
CDN Cache Decision Tree
For every endpoint in the ride-hailing platform, follow this decision process:
- Does the response change per user? Yes:
Cache-Control: private, max-age=N. CDN does not cache. Stop. - Does the response change in real time (< 5 seconds)? Yes:
Cache-Control: no-store. Nothing caches. Stop. - Does the response change frequently (5-60 seconds)? Yes:
Cache-Control: s-maxage=Nwhere N matches the update interval. Addstale-while-revalidate=N/2. - Does the response change rarely (minutes to hours)? Yes:
Cache-Control: s-maxage=N, max-age=Mwhere M < N. Add ETag for conditional revalidation. - Does the response never change (versioned by URL)? Yes:
Cache-Control: public, max-age=31536000, immutable. - Does the response depend on a request header? Only add
Varyif that header has fewer than 10 distinct values. Otherwise, redesign the endpoint to move the variable into the URL path or query string.
Every endpoint in this platform was classified using this tree. The result: 85% of traffic served from CDN edge nodes, 14% origin CPU, and a p95 that dropped by 94%.