Skip to main content
search at depth

Adaptive Replica Selection and Shard Routing

4 min read Chapter 20 of 60

Adaptive Replica Selection and Shard Routing

The Symptom

A three-node cluster has one data node consistently running at 85% CPU while the other two run at 40%. All three nodes host the same number of shard copies. The queries per second are evenly distributed across coordinating nodes. The imbalance is caused by the shard routing strategy sending disproportionate traffic to the overloaded node.

The Internals

When a search query targets a shard, OpenSearch must decide which copy of that shard to send the query to: the primary or one of the replicas. This decision is made by the adaptive replica selection (ARS) algorithm.

ARS maintains a running estimate of each shard copy’s response time, queue depth, and service time. When a query arrives, ARS selects the copy with the lowest estimated latency. This is more sophisticated than simple round-robin, which ignores that different nodes may have different loads, different hardware, or different segment configurations.

The ARS estimate considers:

  • Outstanding searches: how many queries are currently in-flight to that shard copy
  • Response time EWMA: exponentially weighted moving average of recent response times
  • Service time EWMA: the time the shard spent executing the query (excluding queue wait)

A node with 10 outstanding queries and 50ms service time is a worse choice than a node with 2 outstanding queries and 60ms service time, even though the second node is slightly slower per-query.

The Implementation

Search Preference

The preference parameter overrides ARS for specific use cases:

// Sticky session preference: same user always hits the same shard copy.
// Useful for leveraging the filesystem cache and node query cache.
SearchRequest request = SearchRequest.of(s -> s
    .index("docs-v1")
    .preference("user_session_" + sessionId)
    .query(q -> q.match(m -> m.field("body").query(userQuery)))
);
// Local shard preference: prefer shards on the coordinating node.
// Eliminates network hop for queries that land on a data node.
SearchRequest request = SearchRequest.of(s -> s
    .index("docs-v1")
    .preference("_local")
    .query(q -> q.match(m -> m.field("body").query(userQuery)))
);

The session-based preference has a secondary benefit for the filter cache. When the same user repeatedly searches within their tenant, the tenant_id filter bitset is cached on the same node’s segment cache. Rotating across replicas on each request means each node builds its own filter cache for that tenant independently.

Routing for Read Performance

Custom routing at index time (covered in chapter 4) affects read performance directly. When all documents for tenant acme are routed to shard 2, a search with routing("acme") queries only shard 2 instead of all shards.

// HARDENED: Combined routing and filter for multi-tenant search
// Routing limits shard fan-out. Filter ensures data isolation.

SearchRequest request = SearchRequest.of(s -> s
    .index("docs-v1")
    .routing(tenantId)  // Query only the shard containing this tenant's data
    .query(q -> q
        .bool(b -> b
            .filter(f -> f.term(t -> t.field("tenant_id").value(tenantId)))
            .must(mu -> mu.match(m -> m.field("body").query(userQuery)))
        )
    )
);

The combination of routing and filter provides defense in depth. Routing is a performance optimization. The filter is a security boundary. If routing fails (wrong routing value, hash collision), the filter still prevents cross-tenant data exposure.

The Measurement

Impact of routing on search latency for the documentation platform:

ConfigurationShards QueriedQuery Latency (p50)Query Latency (p99)
No routing, 5 shards518ms65ms
With routing, 5 shards16ms22ms
No routing, 10 shards1028ms95ms
With routing, 10 shards17ms24ms

Routing reduces query latency by 60-75% by eliminating fan-out to shards that contain no relevant data.

The Decision Rule

Leave adaptive replica selection enabled (it is on by default) and do not override it with preference unless you have a specific reason. Session-based preference is justified when the same user performs many searches within a session and the filter cache benefit is measurable. _local preference is justified only in specific topologies where coordinating and data node roles overlap.

Always combine routing with a filter clause for multi-tenant search. Routing alone is not a security mechanism; it is a performance hint. The filter clause enforces data isolation.