Redis Caching Patterns and the Stampede Problem

The main chapter introduced the stampede problem: a popular article’s cache entry expires, and hundreds of concurrent requests hit the database simultaneously. Caffeine’s buildAsync solves this for local caches because only one thread loads the value. But the content platform runs 8 application nodes. Each node has its own Caffeine instance. When a Redis entry expires, all 8 nodes independently detect the miss and query the database. That is 8 concurrent loads instead of 1, but under high concurrency it can be 8 times the per-node stampede size.

This section covers Redis caching patterns that prevent stampedes at the distributed layer.

Cache-Aside: The Default Pattern and Its Problems

Cache-aside is the standard Redis caching pattern:

// SLOW: Basic cache-aside with stampede vulnerability
public class RedisArticleCache {
    private final StringRedisTemplate redis;
    private final ObjectMapper mapper;
    private final ArticleRepository repository;

    public Article getArticle(String id) throws Exception {
        String key = "article:" + id;
        String cached = redis.opsForValue().get(key);
        if (cached != null) {
            return mapper.readValue(cached, Article.class);
        }
        // Every thread reaching this point queries the database
        Article article = repository.findById(id);
        String json = mapper.writeValueAsString(article);
        redis.opsForValue().set(key, json, Duration.ofMinutes(5));
        return article;
    }
}

This implementation has the same stampede vulnerability as the naive ConcurrentHashMap from the main chapter, but worse: the stampede spans multiple JVM processes. When the Redis entry for a viral article expires, each of the 8 nodes may have dozens of threads hitting the database concurrently. At 500 req/s for a viral article across 8 nodes, that is up to 500 database queries in a 100 ms window.

Stampede Mitigation Strategy 1: Distributed Locking

The most straightforward fix: use a Redis lock to ensure only one thread across all nodes loads the value.

// FAST: Cache-aside with distributed lock
public class LockingRedisCache {
    private final StringRedisTemplate redis;
    private final ObjectMapper mapper;
    private final ArticleRepository repository;
    private static final Duration LOCK_TIMEOUT = Duration.ofSeconds(5);
    private static final Duration CACHE_TTL = Duration.ofMinutes(5);

    public Article getArticle(String id) throws Exception {
        String key = "article:" + id;
        String cached = redis.opsForValue().get(key);
        if (cached != null) {
            return mapper.readValue(cached, Article.class);
        }

        String lockKey = "lock:article:" + id;
        // Try to acquire lock (SET NX EX)
        Boolean acquired = redis.opsForValue()
            .setIfAbsent(lockKey, "1", LOCK_TIMEOUT);

        if (Boolean.TRUE.equals(acquired)) {
            try {
                // Double-check: another thread may have populated while waiting
                cached = redis.opsForValue().get(key);
                if (cached != null) {
                    return mapper.readValue(cached, Article.class);
                }
                // Winner: load from database
                Article article = repository.findById(id);
                String json = mapper.writeValueAsString(article);
                redis.opsForValue().set(key, json, CACHE_TTL);
                return article;
            } finally {
                redis.delete(lockKey);
            }
        } else {
            // Loser: wait and retry
            return waitForCachePopulation(key, id);
        }
    }

    private Article waitForCachePopulation(String key, String id)
            throws Exception {
        for (int attempt = 0; attempt < 10; attempt++) {
            Thread.sleep(50); // Back off
            String cached = redis.opsForValue().get(key);
            if (cached != null) {
                return mapper.readValue(cached, Article.class);
            }
        }
        // Fallback: load from database (lock holder may have failed)
        return repository.findById(id);
    }
}

The locking pattern reduces 500 concurrent database queries to 1. But it has trade-offs:

Added latency: Lock losers wait 50 ms per retry attempt. In the worst case, they wait 500 ms.
Lock failure: If the lock holder crashes before populating the cache, other threads wait until the lock times out (5 seconds), then one acquires and loads.
Redis round trips: Two additional Redis operations per miss (lock acquire + lock release).

For the content platform, the trade-off is acceptable. The article database query takes 5-10 ms. Waiting 50 ms for a lock is better than 500 queries saturating the connection pool and causing cascading failures.

Stampede Mitigation Strategy 2: Probabilistic Early Expiration

The XFetch algorithm avoids stampedes without locks. The idea: instead of waiting for the TTL to expire, each request probabilistically decides to refresh the entry before it expires. The closer the entry is to expiration, the higher the probability of a refresh. This spreads refreshes over time, making it statistically unlikely that two threads refresh simultaneously.

// FAST: Probabilistic early expiration (XFetch)
public class XFetchRedisCache {
    private final StringRedisTemplate redis;
    private final ObjectMapper mapper;
    private final ArticleRepository repository;
    private final ThreadLocalRandom random = ThreadLocalRandom.current();

    private static final Duration CACHE_TTL = Duration.ofMinutes(5);
    private static final double BETA = 1.0; // Controls refresh eagerness

    public Article getArticle(String id) throws Exception {
        String key = "article:" + id;
        String metaKey = "meta:" + id;

        String cached = redis.opsForValue().get(key);
        if (cached != null) {
            // Check if we should proactively refresh
            String meta = redis.opsForValue().get(metaKey);
            if (meta != null && shouldRefresh(meta)) {
                // Refresh asynchronously; return stale value
                CompletableFuture.runAsync(() -> refreshEntry(id, key, metaKey));
            }
            return mapper.readValue(cached, Article.class);
        }

        return loadAndCache(id, key, metaKey);
    }

    private boolean shouldRefresh(String meta) {
        String[] parts = meta.split(":");
        long expiresAt = Long.parseLong(parts[0]);
        long computeTime = Long.parseLong(parts[1]); // ms to load

        long now = System.currentTimeMillis();
        long remaining = expiresAt - now;

        if (remaining <= 0) return true;

        // XFetch formula: refresh probability increases as TTL decreases
        // P(refresh) = exp(-remaining / (beta * computeTime))
        double probability = Math.exp(
            -remaining / (BETA * Math.max(computeTime, 1)));
        return random.nextDouble() < probability;
    }

    private Article loadAndCache(String id, String key, String metaKey)
            throws Exception {
        long start = System.currentTimeMillis();
        Article article = repository.findById(id);
        long computeTime = System.currentTimeMillis() - start;

        String json = mapper.writeValueAsString(article);
        long expiresAt = System.currentTimeMillis() + CACHE_TTL.toMillis();

        redis.opsForValue().set(key, json, CACHE_TTL);
        redis.opsForValue().set(metaKey,
            expiresAt + ":" + computeTime, CACHE_TTL);

        return article;
    }

    private void refreshEntry(String id, String key, String metaKey) {
        try {
            loadAndCache(id, key, metaKey);
        } catch (Exception e) {
            // Refresh failure is not critical; stale value is still served
        }
    }
}

The XFetch formula:

$$P(\text{refresh}) = e^{-\frac{\text{remaining TTL}}{\beta \times \text{compute time}}}$$

When the remaining TTL is large (entry is fresh), the probability is near 0. When the remaining TTL approaches the compute time, the probability rises. When the remaining TTL is 0, the probability is 1.

The $\beta$ parameter controls eagerness:

$\beta = 0.5$: Conservative. Refreshes start close to expiration.
$\beta = 1.0$: Balanced. Refreshes start at roughly $2 \times \text{compute time}$ before expiration.
$\beta = 2.0$: Aggressive. Refreshes start well before expiration, consuming more database capacity.

For the content platform, $\beta = 1.0$ with a 5-minute TTL and 10 ms compute time means refresh probability exceeds 10% only in the last 23 ms before expiration. The expected number of refreshes before expiration under 500 req/s:

$$E[\text{refreshes}] = 500 \times 0.023 \times 0.5 \approx 5.75$$

Approximately 6 requests will trigger a refresh in the last 23 ms, but since the first one populates the cache, the subsequent 5 see the fresh value and do not trigger additional database queries. In practice, XFetch reduces stampede size from 500 to 1-2 concurrent loads.

The trade-off: XFetch uses extra Redis storage (metadata per key) and extra reads (metadata check on every hit). The additional Redis overhead is about 100 bytes per key and 1 extra GET per access. For the content platform, this is negligible compared to the stampede prevention benefit.

Stampede Mitigation Strategy 3: TTL Jitter

Even without a single hot key, synchronized TTLs cause stampedes. If 10,000 cache entries are all written within a 1-second window (after a cache flush or deployment), they all expire within a 1-second window 5 minutes later. The result: 10,000 simultaneous cache misses.

// SLOW: Fixed TTL causes synchronized expiration
redis.opsForValue().set(key, json, Duration.ofMinutes(5));

// FAST: Jittered TTL spreads expiration over a window
private Duration jitteredTtl(Duration baseTtl) {
    // Add random jitter: 80% to 120% of base TTL
    long baseMs = baseTtl.toMillis();
    long jitter = ThreadLocalRandom.current()
        .nextLong(-baseMs / 5, baseMs / 5);
    return Duration.ofMillis(baseMs + jitter);
}

redis.opsForValue().set(key, json, jitteredTtl(Duration.ofMinutes(5)));
// TTL ranges from 4 minutes to 6 minutes

For the content platform, after a deployment that flushes all caches, TTL jitter spreads the 10,000 re-population queries across a 2-minute window instead of a 1-second spike. At 10,000 queries spread over 120 seconds, that is 83 queries/second. Without jitter, it is 10,000 queries/second for 1 second. The database can handle 83/s. It cannot handle 10,000/s.

Redis Pipeline Optimization

Every Redis command is a network round trip. For cache population that writes multiple keys (article data + metadata + tag indexes), individual commands accumulate latency:

// SLOW: 4 round trips for a single article cache update
public void cacheArticle(Article article) throws Exception {
    String key = "article:" + article.id();
    String json = mapper.writeValueAsString(article);

    redis.opsForValue().set(key, json, Duration.ofMinutes(5));       // RT 1
    redis.opsForSet().add("tag:java", article.id());                 // RT 2
    redis.opsForSet().add("tag:performance", article.id());          // RT 3
    redis.opsForValue().set("meta:" + article.id(),
        System.currentTimeMillis() + ":10", Duration.ofMinutes(5));  // RT 4
}

// FAST: 1 round trip with pipeline
public void cacheArticlePipelined(Article article) throws Exception {
    String key = "article:" + article.id();
    String json = mapper.writeValueAsString(article);
    Duration ttl = jitteredTtl(Duration.ofMinutes(5));

    redis.executePipelined((RedisCallback<Object>) connection -> {
        byte[] keyBytes = key.getBytes(StandardCharsets.UTF_8);
        byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);

        connection.stringCommands()
            .setEx(keyBytes, ttl.getSeconds(), jsonBytes);
        for (String tag : article.categories()) {
            connection.setCommands()
                .sAdd(("tag:" + tag).getBytes(StandardCharsets.UTF_8),
                    article.id().getBytes(StandardCharsets.UTF_8));
        }
        connection.stringCommands()
            .setEx(("meta:" + article.id()).getBytes(StandardCharsets.UTF_8),
                ttl.getSeconds(),
                (System.currentTimeMillis() + ":10")
                    .getBytes(StandardCharsets.UTF_8));
        return null;
    });
}

Approach	Latency (1 article)	Latency (100 articles)
Individual commands	2.4 ms	240 ms
Pipelined	0.8 ms	12 ms

Pipelining sends all commands in a single network batch and reads all responses together. The improvement grows with command count: 4 commands save 1.6 ms; 400 commands (100 articles with 4 commands each) save 228 ms.

For the content platform’s search indexer, which updates 200 article cache entries after a reindexing run, pipelining reduces the cache update from 480 ms to 24 ms.

Cache Invalidation: The Two Hard Problems

Cache invalidation is the second hardest problem in computer science (after naming things). The content platform has three invalidation strategies, each with different consistency guarantees:

TTL-Based Expiration (Eventual)

The simplest approach: let entries expire naturally. The maximum staleness equals the TTL. For the content platform’s article content (which changes rarely), a 5-minute TTL means users see updates within 5 minutes.

Event-Driven Invalidation (Near-Real-Time)

When an article is updated, publish an event that invalidates the cache entry on all nodes:

public class ArticleUpdateHandler {
    private final StringRedisTemplate redis;
    private final RedisMessageListenerContainer listenerContainer;

    public void onArticleUpdated(String articleId) {
        // Delete from Redis
        redis.delete("article:" + articleId);
        // Notify all nodes to invalidate local caches
        redis.convertAndSend("cache-invalidation",
            "article:" + articleId);
    }

    @PostConstruct
    public void subscribeToInvalidation() {
        listenerContainer.addMessageListener(
            (message, pattern) -> {
                String key = new String(message.getBody());
                String id = key.substring("article:".length());
                localCache.synchronous().invalidate(id);
            },
            new ChannelTopic("cache-invalidation")
        );
    }
}

Event-driven invalidation reduces staleness from 5 minutes to milliseconds. But it adds complexity: the pub/sub channel must be reliable, and missed messages cause stale data until TTL expires. Redis pub/sub is fire-and-forget; if a node is disconnected during the event, it misses the invalidation.

Write-Through (Immediate)

Update the cache as part of the write operation:

@Transactional
public Article updateArticle(String id, ArticleUpdate update) {
    Article updated = repository.save(update.toEntity(id));

    // Write-through: update cache in the same logical operation
    String json = mapper.writeValueAsString(updated);
    redis.opsForValue().set("article:" + id, json,
        Duration.ofMinutes(5));

    return updated;
}

Write-through provides immediate consistency for the writing node. But it does not help other nodes that have the old value in their local Caffeine caches. And it couples the write path to Redis availability: if Redis is down, the write fails (or requires fallback logic).

For the content platform, the team uses TTL-based expiration for article content (5-minute staleness acceptable), event-driven invalidation for article metadata displayed in lists (title, thumbnail), and write-through for view counts that update frequently and are displayed prominently.

Monitoring Redis Cache Performance

Redis exposes metrics through the INFO command. The metrics that matter for caching:

# Memory
used_memory: 524,288,000          # 500 MB
maxmemory: 1,073,741,824          # 1 GB limit
maxmemory_policy: allkeys-lfu     # Eviction policy

# Stats
keyspace_hits: 14,523,891         # Total cache hits
keyspace_misses: 832,451          # Total cache misses
evicted_keys: 12,340              # Keys evicted due to memory limit

# Hit rate = hits / (hits + misses) = 94.6%

The content platform monitors these metrics through Prometheus with a Redis exporter and alerts on:

Hit rate below 85%: cache sizing or workload problem
Eviction rate above 100/s: cache undersized for workload
Memory usage above 80%: approaching eviction threshold
Connected clients above 80% of maxclients: connection pool pressure

A healthy cache shows high hit rates, low eviction rates, and stable memory usage well below the maximum. When the metrics deviate, it is a signal that the caching strategy needs revisiting, not that caching itself is wrong.