Design a Content Delivery Network

A Content Delivery Network sits between users and origin servers, caching content at geographically distributed edge locations to reduce latency, absorb traffic spikes, and shield origin infrastructure. This chapter designs a CDN from scratch — covering the multi-tier cache hierarchy, invalidation strategies, geographic routing, content optimization, and security — the system behind Cloudflare, Akamai, or AWS CloudFront.

Requirements

Functional Requirements

Cache and Serve Static Content: HTML, CSS, JS, images, video, fonts served from edge locations closest to the user.
Dynamic Content Acceleration: Optimize request routing and connection reuse for dynamic API responses.
Content Purge/Invalidation: Purge specific URLs or URL patterns from all edge caches within seconds.
Analytics: Real-time metrics on cache hit ratio, bandwidth, latency, and error rates per PoP.
Multi-Origin Support: Route to different origin servers based on path or hostname.
Custom Cache Rules: Per-path TTL overrides, cache key customization, and bypass rules.

Non-Functional Requirements

Metric	Target
Time to First Byte (TTFB)	< 50ms globally on cache hit
Availability	99.99% (< 52 minutes downtime/year)
Content storage	Petabytes across all PoPs
Concurrent users	Millions globally
Purge propagation	< 5 seconds to all PoPs
Cache hit ratio	> 95% for static content
TLS handshake	< 50ms (edge termination)

Capacity Estimation

Assume a CDN serving a large media platform:

Total requests: 10B requests/day = ~115K RPS average, ~1M RPS at peak.
Bandwidth: Average response size 100KB → 10B × 100KB = 1 exabyte/day total transfer. With 95% cache hit ratio, origin serves only ~50PB/day.
Edge storage per PoP: Hot content (top 20% of URLs) covers 80% of requests. 1PB hot content replicated across 200+ PoPs, but each PoP caches only what it needs — typically 10-50TB per edge node.
PoP count: 200+ locations globally. Each PoP has 10-100 servers depending on regional traffic density.
Purge requests: ~10K purge requests/day — small volume but time-critical.

High-Level Design

The CDN uses a three-tier architecture: Edge PoPs close to users, Mid-Tier (Shield) caches that aggregate requests from multiple edges, and the Origin server.

  User (Tokyo)                    User (NYC)
       │                               │
       ▼                               ▼
 ┌───────────┐                  ┌───────────┐
 │ Edge PoP  │                  │ Edge PoP  │
 │  Tokyo    │                  │  New York │
 │ (L1+L2)  │                  │ (L1+L2)  │
 └─────┬─────┘                  └─────┬─────┘
       │ cache miss                   │ cache miss
       └──────────┐     ┌────────────┘
                  ▼     ▼
            ┌──────────────┐
            │  Mid-Tier    │
            │  Shield PoP  │
            │  (US-East)   │
            └──────┬───────┘
                   │ cache miss
                   ▼
            ┌──────────────┐
            │   Origin     │
            │   Server     │
            └──────────────┘

Request Flow:
1. DNS resolves to nearest Edge PoP IP
2. Edge checks L1 (memory) → L2 (disk) cache
3. Cache miss → forward to Shield PoP
4. Shield checks its cache
5. Shield miss → fetch from Origin
6. Response cached at Shield and Edge
7. Subsequent requests served from Edge cache

Why three tiers? Without the mid-tier shield, every edge PoP’s cache miss hits the origin directly. With 200 PoPs, a cache expiration event could trigger 200 simultaneous origin requests for the same URL — a cache stampede. The shield absorbs these, collapsing 200 origin requests into one.

Deep Dive

Pull vs Push CDN

Two fundamental models determine how content reaches edge caches:

Pull CDN (Lazy Loading): Content is cached on first request. The edge receives a user request, checks its cache, and on a miss, fetches from the origin (or shield), caches the response, and serves it.

Cache miss penalty: first user experiences origin latency.
Cache naturally warms based on demand — popular content stays cached, unpopular content evicts.
No proactive bandwidth usage — only requested content is cached.
Best for: large catalogs with long-tail access patterns (e-commerce, news archives).

Push CDN (Proactive Distribution): The origin pushes content to edge PoPs before any user requests it. Typically triggered by a publish event.

Zero cache-miss penalty: content is already present when users arrive.
Wastes bandwidth and storage if pushed content is never requested.
Best for: predictable hot content — new movie releases on Netflix, game patch downloads, live event streams.

Hybrid Approach (Production Standard): Push pre-identified hot content (homepage assets, trending videos) to all edges. Pull everything else on demand. A popularity tracker identifies URLs exceeding a request threshold and proactively pushes them to edges that have not yet cached them.

Cache Architecture

Each edge PoP implements a two-level local cache:

L1 — In-Memory Cache: Fastest access (< 1ms). Limited capacity — typically 32-128GB per edge server. Stores the hottest content based on access frequency. Implemented as a concurrent LRU cache.

L2 — Disk Cache: SSD-backed, larger capacity (1-10TB per server). Slower than L1 (1-5ms) but orders of magnitude faster than a network round-trip to the shield. Content evicts from L1 to L2 before being purged entirely.

Cache Key Design: The cache key determines whether two requests map to the same cached response. A naive URL-only key causes problems when the same URL serves different content based on headers:

public record CacheKey(
    String url,
    String acceptEncoding,  // gzip vs brotli → different response bytes
    String deviceType,      // mobile vs desktop for responsive images
    String acceptLanguage   // locale-specific content
) {
    public static CacheKey from(HttpRequest request) {
        return new CacheKey(
            request.uri().toString(),
            request.headers().firstValue("Accept-Encoding").orElse("identity"),
            classifyDevice(request.headers().firstValue("User-Agent").orElse("")),
            request.headers().firstValue("Accept-Language").orElse("en")
        );
    }

    private static String classifyDevice(String userAgent) {
        if (userAgent.contains("Mobile")) return "mobile";
        if (userAgent.contains("Tablet")) return "tablet";
        return "desktop";
    }

    public String toHashKey() {
        return Integer.toHexString(this.hashCode());
    }
}

LRU Cache with ConcurrentHashMap: The L1 in-memory cache uses a ConcurrentHashMap combined with an access-ordered eviction policy:

public class EdgeCache {
    private final ConcurrentHashMap<String, CacheEntry> store;
    private final int maxEntries;
    private final ConcurrentLinkedDeque<String> accessOrder;

    public record CacheEntry(
        byte[] content,
        Map<String, String> headers,
        Instant cachedAt,
        Duration ttl,
        String etag
    ) {
        public boolean isExpired() {
            return Instant.now().isAfter(cachedAt.plus(ttl));
        }
    }

    public EdgeCache(int maxEntries) {
        this.maxEntries = maxEntries;
        this.store = new ConcurrentHashMap<>(maxEntries);
        this.accessOrder = new ConcurrentLinkedDeque<>();
    }

    public CacheEntry get(CacheKey key) {
        String hashKey = key.toHashKey();
        CacheEntry entry = store.get(hashKey);
        if (entry == null) return null;

        if (entry.isExpired()) {
            store.remove(hashKey);
            return null;
        }

        // Promote to most-recently-used
        accessOrder.remove(hashKey);
        accessOrder.addFirst(hashKey);
        return entry;
    }

    public void put(CacheKey key, CacheEntry entry) {
        String hashKey = key.toHashKey();
        store.put(hashKey, entry);
        accessOrder.addFirst(hashKey);

        // Evict least-recently-used entries if over capacity
        while (store.size() > maxEntries) {
            String evictKey = accessOrder.pollLast();
            if (evictKey != null) {
                store.remove(evictKey);
            }
        }
    }
}

Cache Invalidation

Cache invalidation is the hardest problem in a CDN. Four strategies address different use cases:

1. TTL-Based Expiration: The origin sets Cache-Control: max-age=3600 headers. Edge caches honor the TTL and serve stale content until expiration. After TTL, the next request triggers a revalidation with the origin using If-None-Match (ETag) or If-Modified-Since.

2. Versioned URLs: Static assets embed a version or content hash in the URL: /static/app-v2.3.1.js or /static/app.a3f9c2.js. These URLs never change — set TTL to 1 year. New deployments produce new URLs. Old URLs eventually evict from cache through LRU. This is the most reliable invalidation strategy because it sidesteps invalidation entirely.

3. Purge API: For content that cannot use versioned URLs (e.g., news articles, product pages), an explicit purge API broadcasts invalidation to all PoPs:

public sealed interface InvalidationEvent
    permits PurgeByUrl, PurgeByPrefix, PurgeAll {}

public record PurgeByUrl(String url) implements InvalidationEvent {}
public record PurgeByPrefix(String urlPrefix) implements InvalidationEvent {}
public record PurgeAll(String hostname) implements InvalidationEvent {}

public class CacheInvalidationProcessor {
    private final List<EdgePoP> pops;
    private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();

    public CompletableFuture<InvalidationResult> process(InvalidationEvent event) {
        List<CompletableFuture<Void>> futures = pops.stream()
            .map(pop -> CompletableFuture.runAsync(
                () -> invalidateAt(pop, event), executor))
            .toList();

        return CompletableFuture.allOf(futures.toArray(CompletableFuture[]::new))
            .thenApply(_ -> new InvalidationResult(
                event, pops.size(), Instant.now()));
    }

    private void invalidateAt(EdgePoP pop, InvalidationEvent event) {
        switch (event) {
            case PurgeByUrl p    -> pop.cache().remove(p.url());
            case PurgeByPrefix p -> pop.cache().removeByPrefix(p.urlPrefix());
            case PurgeAll p      -> pop.cache().clearHost(p.hostname());
        }
    }
}

public record InvalidationResult(
    InvalidationEvent event,
    int popsInvalidated,
    Instant completedAt
) {}

Virtual threads enable broadcasting to 200+ PoPs concurrently — each invalidation RPC runs in its own virtual thread, completing all PoPs in parallel within seconds.

4. Stale-While-Revalidate: The Cache-Control: stale-while-revalidate=60 directive tells the edge to serve stale content immediately while asynchronously fetching a fresh copy from the origin. The user gets instant response; the cache refreshes in the background. This eliminates revalidation latency for the requesting user at the cost of serving briefly stale content.

Geographic Routing

Users must reach the nearest edge PoP for minimal latency. Four routing mechanisms exist:

DNS-Based Routing (GeoDNS): The authoritative DNS server maps the client’s resolver IP to a geographic region and returns the IP of the nearest edge PoP. Latency depends on the accuracy of IP-to-location databases. This is the most common approach (used by Akamai, Cloudflare).

Anycast Routing: All edge PoPs advertise the same IP address via BGP. The network’s routing infrastructure (ISP routers) delivers packets to the nearest PoP based on BGP path length. No DNS configuration needed — the network handles routing automatically. Resilient to PoP failures: if a PoP goes down, BGP withdraws the route, and traffic reroutes to the next nearest PoP within seconds.

HTTP Redirect: The first request hits a central router that measures client latency or geolocation and responds with a 302 redirect to the nearest edge URL. Adds one round-trip of latency on first request. Rarely used as a primary mechanism — more common as a fallback.

Latency-Based Routing: Active probes measure RTT between clients and PoPs. A routing service maintains a latency map and directs traffic to the PoP with the lowest measured RTT, not necessarily the geographically closest one. Accounts for network congestion that pure geographic routing misses.

Method	Latency Overhead	Failover Speed	Accuracy
GeoDNS	DNS TTL (60-300s)	Depends on DNS TTL	IP-geo database accuracy
Anycast	None	Seconds (BGP convergence)	Network topology-based
HTTP Redirect	+1 RTT	Immediate	Measured latency
Latency-Based	None after initial probe	Probe interval	Highest — real measurements

Content Optimization

Edge PoPs do more than cache — they transform content to reduce bytes on the wire:

Compression: Brotli compression for static assets (pre-compressed at build time, 15-20% smaller than gzip). Gzip for dynamic content (faster compression, acceptable ratio). The Accept-Encoding header in the cache key ensures compressed and uncompressed variants are cached separately.

Image Optimization: On-the-fly format conversion (JPEG → WebP/AVIF based on Accept header), resizing to device viewport width, and quality adjustment. An image transformation service at the edge processes these requests and caches the result. For a 2MB source JPEG, WebP output at mobile resolution could be 50KB — a 97% reduction.

HTTP/2 and HTTP/3: HTTP/2 multiplexing serves multiple assets over a single TCP connection, eliminating head-of-line blocking at the HTTP layer. HTTP/3 (QUIC) runs over UDP, eliminating TCP head-of-line blocking entirely — critical for mobile users on lossy networks where a single dropped packet stalls all TCP streams.

Connection Reuse: Edge PoPs maintain persistent connections to shield/origin servers. A user’s TLS handshake terminates at the edge, but the edge-to-origin connection is pre-established and reused across thousands of requests, amortizing connection setup cost.

Security

TLS Termination at Edge: The edge PoP terminates the user’s TLS connection, decrypts the request, serves from cache if possible, and re-encrypts only for cache misses forwarded to the origin. This reduces end-to-end latency because the TLS handshake completes with a nearby server (< 10ms RTT) rather than a distant origin (potentially 200ms+ RTT).

Signed URLs for Private Content: Premium or gated content uses signed URLs with HMAC authentication and expiration timestamps. The edge validates the signature before serving:

public class SignedUrlGenerator {
    private static final String ALGORITHM = "HmacSHA256";
    private final SecretKeySpec signingKey;

    public SignedUrlGenerator(String secret) {
        this.signingKey = new SecretKeySpec(
            secret.getBytes(StandardCharsets.UTF_8), ALGORITHM);
    }

    public String generateSignedUrl(String baseUrl, Duration validFor) {
        long expires = Instant.now().plus(validFor).getEpochSecond();
        String dataToSign = baseUrl + "|" + expires;

        try {
            Mac mac = Mac.getInstance(ALGORITHM);
            mac.init(signingKey);
            byte[] signatureBytes = mac.doFinal(
                dataToSign.getBytes(StandardCharsets.UTF_8));
            String signature = Base64.getUrlEncoder()
                .withoutPadding().encodeToString(signatureBytes);

            return baseUrl + "?expires=" + expires + "&sig=" + signature;
        } catch (Exception e) {
            throw new RuntimeException("Failed to generate signed URL", e);
        }
    }

    public boolean validateSignedUrl(String url) {
        // Parse expires and sig from query params
        var params = parseQueryParams(url);
        String baseUrl = url.substring(0, url.indexOf('?'));
        long expires = Long.parseLong(params.get("expires"));
        String providedSig = params.get("sig");

        // Check expiration
        if (Instant.now().getEpochSecond() > expires) {
            return false; // URL expired
        }

        // Recompute signature and compare
        String dataToSign = baseUrl + "|" + expires;
        try {
            Mac mac = Mac.getInstance(ALGORITHM);
            mac.init(signingKey);
            byte[] expectedBytes = mac.doFinal(
                dataToSign.getBytes(StandardCharsets.UTF_8));
            String expectedSig = Base64.getUrlEncoder()
                .withoutPadding().encodeToString(expectedBytes);

            return MessageDigest.isEqual(
                expectedSig.getBytes(), providedSig.getBytes());
        } catch (Exception e) {
            return false;
        }
    }

    private Map<String, String> parseQueryParams(String url) {
        String query = url.substring(url.indexOf('?') + 1);
        return Arrays.stream(query.split("&"))
            .map(p -> p.split("=", 2))
            .collect(Collectors.toMap(a -> a[0], a -> a[1]));
    }
}

MessageDigest.isEqual performs constant-time comparison, preventing timing attacks where an attacker measures response time differences to guess signature bytes.

DDoS Protection: Edge PoPs implement rate limiting per IP and per ASN (Autonomous System Number). IP reputation scoring downgrades traffic from known-bad sources. During an attack, the edge absorbs traffic without forwarding to the origin — the CDN acts as a distributed firewall.

Origin Shielding: The mid-tier cache layer prevents attack traffic from reaching the origin. Even if an attacker bypasses edge caching (e.g., by randomizing query parameters), the shield collapses these requests and serves cached responses.

Bottlenecks & Scaling

Bottleneck	Solution
Cache miss storms	Request coalescing: when multiple concurrent requests miss on the same cache key, only one request fetches from the origin. Other requests wait for the first one to complete and receive the same response. This collapses N origin requests into 1.
Origin overload	Origin shield (mid-tier) absorbs most cache misses. Admission control at the shield rate-limits origin requests. If the origin is unhealthy, serve stale content with a `stale-if-error` directive.
Stale content	Aggressive TTL tuning per content type: 1 year for versioned assets, 5 min for product pages, 30s for trending content. Event-driven purge via webhooks: CMS publish event → purge API → edge invalidation in < 5 seconds.
Edge capacity limits	Horizontal scaling: add servers to existing PoPs for capacity, add new PoPs for geographic coverage. Auto-scale based on request rate and cache hit ratio — low hit ratio indicates insufficient cache capacity.
TLS overhead	Session resumption (TLS 1.3 0-RTT) eliminates handshake latency for returning visitors. OCSP stapling removes the certificate validation round-trip. Shared session ticket keys across edge servers in a PoP.
Purge propagation delay	Fan-out purge via a dedicated pub/sub channel (Kafka or Redis Streams). Each PoP subscribes and processes invalidation events independently. Target: 200+ PoPs invalidated within 5 seconds.

Interviewer Tips

Start with the three-tier architecture: Sketch Edge → Shield → Origin immediately. This shows you understand cache hierarchy and origin protection — the two most critical CDN concepts.
Quantify the cache hit ratio impact: A 95% hit ratio means the origin handles only 5% of traffic. A 5% improvement to 99% cuts origin load by 80%. These numbers demonstrate the business value of caching.
Versioned URLs over purge APIs: When discussing invalidation, lead with versioned URLs as the preferred strategy. Purge APIs are a fallback for content that cannot be versioned. This shows pragmatism.
Anycast is the modern answer: If asked “How do users reach the nearest PoP?”, Anycast is the strongest answer because it requires no application-layer logic — the network routes automatically. Mention BGP convergence time for failover.
Request coalescing is the advanced topic: Most candidates miss this. When an interviewer asks “What happens if a popular cached item expires?”, the answer is request coalescing — not “the origin handles the load.” This distinguishes senior-level answers.
Security is a common follow-up: Have signed URLs and TLS termination ready. Interviewers often ask “How do you serve paid content?” or “Where does TLS terminate?” Edge TLS termination with re-encryption to origin is the standard pattern.