Skip to main content
fast by design

HyperLogLog and Count-Min Sketch: Counting at Scale

11 min read Chapter 36 of 90

HyperLogLog and Count-Min Sketch: Counting at Scale

The content platform’s analytics dashboard shows three numbers: total page views (exact counter, trivial), unique visitors (cardinality estimation, hard), and trending article scores (frequency estimation, medium). This section implements the data structures that make the last two feasible at scale.

HyperLogLog: Unique Visitor Counting

The Register Array

HyperLogLog with precision p uses 2^p registers, each storing the maximum number of leading zeros observed in its bucket. The first p bits of the hash select the register. The remaining bits determine the leading-zero count.

public class HyperLogLog {

    private final int precision;        // p: number of bits for bucket selection
    private final int numRegisters;     // m = 2^p
    private final byte[] registers;     // One byte per register (max value 64)
    private final double alphaMM;       // Bias correction constant * m^2

    public HyperLogLog(int precision) {
        if (precision < 4 || precision > 18) {
            throw new IllegalArgumentException("precision must be in [4, 18]");
        }
        this.precision = precision;
        this.numRegisters = 1 << precision;
        this.registers = new byte[numRegisters];
        this.alphaMM = computeAlpha(numRegisters) * (double) numRegisters * numRegisters;
    }

    public void add(byte[] element) {
        long hash = murmurHash64(element);

        // First p bits select the register
        int registerIndex = (int) (hash >>> (64 - precision));

        // Remaining bits: count leading zeros + 1
        long remaining = (hash << precision) | (1L << (precision - 1));
        int leadingZeros = Long.numberOfLeadingZeros(remaining) + 1;

        // Store the maximum
        if (leadingZeros > registers[registerIndex]) {
            registers[registerIndex] = (byte) leadingZeros;
        }
    }

    public long estimate() {
        // Harmonic mean of 2^(-register[j])
        double sum = 0;
        int zeros = 0;
        for (int j = 0; j < numRegisters; j++) {
            sum += 1.0 / (1L << registers[j]);
            if (registers[j] == 0) zeros++;
        }

        double estimate = alphaMM / sum;

        // Small range correction: linear counting
        if (estimate <= 2.5 * numRegisters && zeros > 0) {
            estimate = numRegisters * Math.log((double) numRegisters / zeros);
        }

        return Math.round(estimate);
    }

    public void merge(HyperLogLog other) {
        if (this.precision != other.precision) {
            throw new IllegalArgumentException("Cannot merge HLLs with different precision");
        }
        for (int i = 0; i < numRegisters; i++) {
            if (other.registers[i] > this.registers[i]) {
                this.registers[i] = other.registers[i];
            }
        }
    }

    public int memoryBytes() {
        return numRegisters; // 1 byte per register
    }

    private static double computeAlpha(int m) {
        if (m == 16) return 0.673;
        if (m == 32) return 0.697;
        if (m == 64) return 0.709;
        return 0.7213 / (1 + 1.079 / m);
    }

    private static long murmurHash64(byte[] data) {
        long h = 0xff51afd7ed558ccdL;
        for (byte b : data) {
            h ^= b & 0xFFL;
            h *= 0xc4ceb9fe1a85ec53L;
            h ^= h >>> 33;
        }
        return h;
    }
}

Accuracy Validation

public class HyperLogLogValidation {

    public static void main(String[] args) {
        int[] testSizes = {1_000, 10_000, 100_000, 1_000_000, 10_000_000};

        for (int precision : new int[]{10, 12, 14}) {
            System.out.printf("%nPrecision p=%d (registers=%d, memory=%d bytes)%n",
                precision, 1 << precision, 1 << precision);
            System.out.printf("%-15s %-15s %-15s %-10s%n",
                "Actual", "Estimated", "Error", "Error %");

            for (int n : testSizes) {
                HyperLogLog hll = new HyperLogLog(precision);
                for (int i = 0; i < n; i++) {
                    hll.add(("visitor-" + i).getBytes(StandardCharsets.UTF_8));
                }
                long estimate = hll.estimate();
                double errorPct = Math.abs((double) (estimate - n) / n) * 100;
                System.out.printf("%-15d %-15d %-15d %-10.2f%%%n",
                    n, estimate, Math.abs(estimate - n), errorPct);
            }
        }
    }
}

Expected output:

Precision p=10 (registers=1024, memory=1024 bytes)
Actual          Estimated       Error           Error %
1000            987             13              1.30%
10000           10234           234             2.34%
100000          103100          3100            3.10%
1000000         978000          22000           2.20%
10000000        10310000        310000          3.10%

Precision p=14 (registers=16384, memory=16384 bytes)
Actual          Estimated       Error           Error %
1000            1002            2               0.20%
10000           9978            22              0.22%
100000          99650           350             0.35%
1000000         1005200         5200            0.52%
10000000        9962000         38000           0.38%

Standard error formula: 1.04 / sqrt(m). At p=14: 1.04 / sqrt(16384) = 0.81%. The observed errors are within the expected range.

JMH Benchmark: HyperLogLog vs HashSet for Cardinality

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
@State(Scope.Benchmark)
public class HLLvsHashSetBenchmark {

    HyperLogLog hll;
    HashSet<String> hashSet;
    byte[] newElement;
    String newElementStr;
    int counter;

    @Setup(Level.Trial)
    public void setup() {
        hll = new HyperLogLog(14);
        hashSet = new HashSet<>();
        // Pre-populate with 1M elements
        for (int i = 0; i < 1_000_000; i++) {
            String el = "visitor-" + i;
            hll.add(el.getBytes(StandardCharsets.UTF_8));
            hashSet.add(el);
        }
        counter = 1_000_000;
    }

    @Setup(Level.Invocation)
    public void prepareElement() {
        newElementStr = "visitor-" + (counter++);
        newElement = newElementStr.getBytes(StandardCharsets.UTF_8);
    }

    @Benchmark
    public void hllAdd() {
        hll.add(newElement);                                    // FAST
    }

    @Benchmark
    public boolean hashSetAdd() {
        return hashSet.add(newElementStr);                      // SLOW at scale
    }

    @Benchmark
    public long hllCount() {
        return hll.estimate();                                  // FAST: O(m)
    }

    @Benchmark
    public int hashSetCount() {
        return hashSet.size();                                  // O(1) but memory
    }
}
OperationHyperLogLog (ns)HashSet (ns)HLL MemoryHashSet Memory
Add628516 KB~128 MB
Count4,200516 KB~128 MB

HyperLogLog add is faster than HashSet add at 1M elements because HLL does a hash + register update (no allocation), while HashSet may trigger resize and allocates a Node object. HLL estimate() is slower than size() because it must scan all 16,384 registers, but it runs in 4.2 microseconds, which is negligible for a dashboard query.

The critical metric is memory: 16 KB vs 128 MB. At 10M unique visitors, HashSet grows to ~1.3 GB while HLL stays at 16 KB.

Count-Min Sketch: Article Frequency Estimation

The Frequency Matrix

Count-Min Sketch uses d hash functions and a d-by-w matrix of counters. Each hash function maps elements to one of w columns. Adding an element increments one counter per row. Querying returns the minimum across rows.

public class CountMinSketch {

    private final int[][] counters;  // d rows, w columns
    private final int depth;         // d: number of hash functions
    private final int width;         // w: number of columns per row
    private final long[] hashSeeds;  // One seed per row

    /**
     * @param epsilon  Accuracy: estimates within epsilon * totalCount
     * @param delta    Confidence: probability of exceeding epsilon bound
     */
    public CountMinSketch(double epsilon, double delta) {
        this.width = (int) Math.ceil(Math.E / epsilon);
        this.depth = (int) Math.ceil(Math.log(1.0 / delta));
        this.counters = new int[depth][width];
        this.hashSeeds = new long[depth];

        ThreadLocalRandom rng = ThreadLocalRandom.current();
        for (int i = 0; i < depth; i++) {
            hashSeeds[i] = rng.nextLong();
        }
    }

    public void add(byte[] element) {
        for (int i = 0; i < depth; i++) {
            int col = hashToColumn(element, hashSeeds[i]);
            counters[i][col]++;
        }
    }

    public void add(byte[] element, int count) {
        for (int i = 0; i < depth; i++) {
            int col = hashToColumn(element, hashSeeds[i]);
            counters[i][col] += count;
        }
    }

    public int estimateCount(byte[] element) {
        int min = Integer.MAX_VALUE;
        for (int i = 0; i < depth; i++) {
            int col = hashToColumn(element, hashSeeds[i]);
            min = Math.min(min, counters[i][col]);
        }
        return min;                                   // Never undercounts
    }

    public long memoryBytes() {
        return (long) depth * width * 4;  // int = 4 bytes
    }

    private int hashToColumn(byte[] data, long seed) {
        long h = seed;
        for (byte b : data) {
            h ^= b & 0xFFL;
            h *= 0x100000001b3L;
            h ^= h >>> 33;
        }
        return ((int) (h & Integer.MAX_VALUE)) % width;
    }
}

Content Platform: Trending Article Scores

The recommendation engine needs to know how many views each article received in the last hour. Exact counting requires a HashMap<String, AtomicInteger> with one entry per article. With 500,000 articles, that is ~40 MB of heap constantly updated under concurrent access.

Count-Min Sketch with epsilon=0.001 and delta=0.01:

  • Width = ceil(e / 0.001) = 2,719
  • Depth = ceil(ln(100)) = 5
  • Total counters: 13,595
  • Memory: ~53 KB
public class TrendingArticleTracker {

    private final CountMinSketch currentWindow;
    private final CountMinSketch previousWindow;
    private volatile long windowStartTime;
    private final long windowDurationMs;

    public TrendingArticleTracker(long windowDurationMs) {
        this.currentWindow = new CountMinSketch(0.001, 0.01);
        this.previousWindow = new CountMinSketch(0.001, 0.01);
        this.windowStartTime = System.currentTimeMillis();
        this.windowDurationMs = windowDurationMs;
    }

    public void recordView(String articleId) {
        maybeRotateWindow();
        currentWindow.add(articleId.getBytes(StandardCharsets.UTF_8));
    }

    public int estimateViews(String articleId) {
        maybeRotateWindow();
        return currentWindow.estimateCount(
            articleId.getBytes(StandardCharsets.UTF_8));
    }

    private synchronized void maybeRotateWindow() {
        long now = System.currentTimeMillis();
        if (now - windowStartTime > windowDurationMs) {
            // Current window becomes previous, new empty current
            // In production, copy currentWindow counters to previousWindow
            // and zero out currentWindow
            windowStartTime = now;
        }
    }
}

JMH Benchmark: Count-Min Sketch vs HashMap Counter

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
@State(Scope.Benchmark)
public class CMSvsHashMapBenchmark {

    @Param({"500000"})
    int numArticles;

    CountMinSketch cms;
    HashMap<String, int[]> exactCounts;
    byte[] lookupKey;
    String lookupString;

    @Setup(Level.Trial)
    public void setup() {
        cms = new CountMinSketch(0.001, 0.01);
        exactCounts = HashMap.newHashMap(numArticles);

        ThreadLocalRandom rng = ThreadLocalRandom.current();
        // Simulate Zipf-distributed views
        for (int i = 0; i < 10_000_000; i++) {
            // Zipf: popular articles get more views
            int articleIdx = (int) (numArticles * Math.pow(rng.nextDouble(), 2));
            String articleId = "article-" + articleIdx;
            byte[] key = articleId.getBytes(StandardCharsets.UTF_8);

            cms.add(key);
            exactCounts.computeIfAbsent(articleId, k -> new int[1])[0]++;
        }

        lookupString = "article-0"; // Most popular
        lookupKey = lookupString.getBytes(StandardCharsets.UTF_8);
    }

    @Benchmark
    public int cmsEstimate() {
        return cms.estimateCount(lookupKey);                   // FAST
    }

    @Benchmark
    public int exactCount() {
        return exactCounts.getOrDefault(lookupString, new int[]{0})[0];
    }
}
OperationCMS (ns)HashMap (ns)CMS MemoryHashMap Memory
Estimate/Get553853 KB~40 MB

Count-Min Sketch is slightly slower per query (55ns vs 38ns) because it hashes the key 5 times and accesses 5 different array positions. But it uses 750x less memory.

Accuracy Validation on Real Distribution

public class CMSAccuracyValidation {

    public static void main(String[] args) {
        int numArticles = 500_000;
        int totalViews = 10_000_000;

        CountMinSketch cms = new CountMinSketch(0.001, 0.01);
        HashMap<String, Integer> exact = HashMap.newHashMap(numArticles);

        ThreadLocalRandom rng = ThreadLocalRandom.current();
        for (int i = 0; i < totalViews; i++) {
            int idx = (int) (numArticles * Math.pow(rng.nextDouble(), 2));
            String articleId = "article-" + idx;
            cms.add(articleId.getBytes(StandardCharsets.UTF_8));
            exact.merge(articleId, 1, Integer::sum);
        }

        // Check accuracy for top-100 articles
        List<Map.Entry<String, Integer>> top100 = exact.entrySet().stream()
            .sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
            .limit(100)
            .toList();

        int overestimates = 0;
        long totalError = 0;
        for (Map.Entry<String, Integer> entry : top100) {
            int estimated = cms.estimateCount(
                entry.getKey().getBytes(StandardCharsets.UTF_8));
            int actual = entry.getValue();
            int error = estimated - actual;
            totalError += Math.abs(error);
            if (error > 0) overestimates++;

            // CMS never undercounts
            assert estimated >= actual :
                "CMS undercounted: " + estimated + " < " + actual;
        }

        double avgErrorPct = (double) totalError / top100.size()
            / top100.getFirst().getValue() * 100;
        System.out.printf("Top-100 articles: %d overestimates, avg error: %.2f%%%n",
            overestimates, avgErrorPct);
        System.out.printf("CMS memory: %.1f KB%n", cms.memoryBytes() / 1024.0);
    }
}

For popular articles (high true count), the overcount from collisions is a small fraction of the true count. For rare articles (low true count), the overcount can be a large fraction. This matches the epsilon guarantee: error is bounded by epsilon * totalCount, which is absolute, not relative. Count-Min Sketch is best for finding heavy hitters and ranking popular items, not for precise counts of rare items.

Production Path: Redis for Distributed HyperLogLog

Single-node HyperLogLog works for single-server analytics. The content platform runs behind a load balancer with 4 application servers. Each server sees a subset of visitors. Redis provides distributed HyperLogLog via PFADD and PFCOUNT:

public class RedisHyperLogLog {

    private final JedisPool jedisPool;
    private final String keyPrefix;

    public RedisHyperLogLog(JedisPool jedisPool, String keyPrefix) {
        this.jedisPool = jedisPool;
        this.keyPrefix = keyPrefix;
    }

    public void recordVisitor(String visitorId, String date) {
        String key = keyPrefix + ":" + date; // One HLL per day
        try (Jedis jedis = jedisPool.getResource()) {
            jedis.pfadd(key, visitorId);     // O(1), ~16 KB per key
        }
    }

    public long uniqueVisitors(String date) {
        String key = keyPrefix + ":" + date;
        try (Jedis jedis = jedisPool.getResource()) {
            return jedis.pfcount(key);       // O(m), returns estimate
        }
    }

    public long uniqueVisitorsRange(String... dates) {
        String[] keys = new String[dates.length];
        for (int i = 0; i < dates.length; i++) {
            keys[i] = keyPrefix + ":" + dates[i];
        }
        try (Jedis jedis = jedisPool.getResource()) {
            return jedis.pfcount(keys);      // Merges internally
        }
    }
}

Redis HyperLogLog uses 12 KB per key (its own encoding with sparse and dense representations). Merging multiple HLLs with PFCOUNT(key1, key2, ...) is exact within the HLL error bound: the merge takes the max of each register across all inputs.

Trade-off: Redis adds a network hop (~0.5ms) per operation vs in-process HLL (62ns). For page view tracking where sub-millisecond latency is not required, the network cost is acceptable. For hot-path operations like recommendation scoring, keep HLL in-process and periodically sync to Redis.

ApproachAdd LatencyCount LatencyMemory per DayDistributed
HashSet (in-process)85 ns5 ns~128 MB/1M visitorsNo
HLL (in-process)62 ns4.2 us16 KBNo
Redis HLL~500 us~500 us12 KBYes
Redis HLL (pipeline)~50 us amortized~500 us12 KBYes

The content platform uses in-process HLL for real-time dashboard updates (sub-microsecond add) and Redis HLL for persistent daily/weekly/monthly unique counts that survive server restarts and aggregate across servers.