Metric Aggregations and the HyperLogLog Cardinality Estimator

The Symptom

The product manager asks “how many unique users searched this week?” The developer writes a terms aggregation on user_id with size: 1000000. The query takes 45 seconds and the coordinating node runs out of heap. The answer is 237,000 unique users, but the query nearly crashed the cluster to compute it.

The Internals

The cardinality aggregation uses HyperLogLog++ (HLL++), a probabilistic algorithm that estimates the number of distinct values in a field without storing all values in memory. HLL++ uses a fixed amount of memory (determined by the precision_threshold parameter) regardless of the actual cardinality.

The precision_threshold controls the trade-off between accuracy and memory:

precision_threshold	Memory	Error Rate (approximate)
100	~1.6KB	~6%
1,000	~16KB	~2%
10,000	~160KB	~0.5%
40,000 (max)	~640KB	~0.25%

For cardinalities below the precision_threshold, the result is exact. Above it, the error is bounded by the rate shown.

The Implementation

// HARDENED: Search analytics aggregation for the documentation platform
// Computes unique users, unique queries, and search volume in one request

SearchRequest analyticsQuery = SearchRequest.of(s -> s
    .index("search-logs")
    .size(0)  // No hits needed, only aggregations
    .query(q -> q
        .bool(b -> b
            .filter(f -> f.range(r -> r
                .field("timestamp")
                .gte(JsonData.of("now-7d"))
            ))
            .filter(f -> f.term(t -> t
                .field("tenant_id").value(tenantId)))
        )
    )
    .aggregations("unique_users", a -> a
        .cardinality(c -> c
            .field("user_id")
            .precisionThreshold(10000)
        )
    )
    .aggregations("unique_queries", a -> a
        .cardinality(c -> c
            .field("query_text.raw")
            .precisionThreshold(10000)
        )
    )
    .aggregations("total_searches", a -> a
        .valueCount(vc -> vc.field("_id"))
    )
    .aggregations("result_count_stats", a -> a
        .stats(st -> st.field("result_count"))
    )
    .aggregations("latency_percentiles", a -> a
        .percentiles(p -> p
            .field("latency_ms")
            .percents(50.0, 90.0, 95.0, 99.0)
        )
    )
    .aggregations("zero_result_rate", a -> a
        .filter(f -> f.term(t -> t.field("result_count").value(0)))
    )
);

Zero-Result Query Analysis

// Queries that return zero results indicate search quality gaps

SearchRequest zeroResultQueries = SearchRequest.of(s -> s
    .index("search-logs")
    .size(0)
    .query(q -> q
        .bool(b -> b
            .filter(f -> f.term(t -> t.field("result_count").value(0)))
            .filter(f -> f.range(r -> r.field("timestamp").gte(JsonData.of("now-7d"))))
            .filter(f -> f.term(t -> t.field("tenant_id").value(tenantId)))
        )
    )
    .aggregations("top_zero_result_queries", a -> a
        .terms(t -> t
            .field("query_text.raw")
            .size(20)
            .minDocCount(3)  // Only show queries that failed multiple times
        )
    )
);

Zero-result queries are the most actionable search analytics signal. Each represents a user need that the documentation does not satisfy (missing content), a vocabulary mismatch (the content exists but uses different terms), or a search configuration problem (the analyzer or query structure prevents matching).

The Measurement

Weekly search analytics for the documentation platform:

Metric	Value	Health Indicator
Unique users	23,400	Baseline: growing
Unique queries	8,900	Query diversity
Total searches	145,000	Volume
Avg results per query	12.3	Recall
Zero-result rate	8.2%	< 5% is good, > 15% is problematic
p50 latency	18ms	Good
p99 latency	85ms	Acceptable

A zero-result rate above 15% indicates systemic search quality problems. Between 5% and 15% is normal for documentation search (some queries are genuinely not covered). Below 5% suggests the analyzer may be too aggressive with fuzzy matching, returning marginally relevant results instead of admitting no good match exists.

The Decision Rule

Use the cardinality aggregation with precision_threshold: 10000 for any “count unique” analytics query. Never use a terms aggregation with a large size to count unique values. The memory difference is five orders of magnitude.

Track zero-result rate as the primary search quality metric in production. It is cheaper to compute than NDCG (no relevance judgments required) and directly actionable (each zero-result query is a specific improvement opportunity).

Log every search query with its result count, latency, and user ID. This log becomes the source data for search analytics, zero-result analysis, and query test set expansion (add frequently-failing queries to the test set as new entries).