ISM Policy Design and Rollover Strategies
ISM Policy Design and Rollover Strategies
The Symptom
The team deploys a single ISM policy for all tenants. Tenant A writes 50,000 documents per day. Tenant B writes 200 documents per day. After 30 days, the rollover triggers for both. Tenant A’s index has 1.5 million documents in a 75GB shard—well above the target. Tenant B’s index has 6,000 documents in a 300MB shard—far below the minimum for efficient search. Both are rolled over at the same time, one too late and one too early.
The Internals
ISM policies evaluate conditions periodically (default: 5 minutes). When a transition condition is met, the policy initiates the configured actions. Actions execute in order, and the policy waits for each to complete before starting the next.
Rollover conditions support three criteria:
min_index_age: time since index creationmin_doc_count: number of documents in the indexmin_size: total primary shard size
When multiple conditions are specified, rollover triggers when any condition is met. This is an OR operation, not AND. A large tenant hits the doc count threshold before the age threshold. A small tenant hits the age threshold before the doc count threshold.
The Implementation
ISM Policy Management via REST
public class ISMPolicyManager {
private final RestClient restClient;
public ISMPolicyManager(RestClient restClient) {
this.restClient = restClient;
}
public void createPolicy(String policyId, String policyJson)
throws IOException {
Request request = new Request("PUT",
"/_plugins/_ism/policies/" + policyId);
request.setJsonEntity(policyJson);
Response response = restClient.performRequest(request);
if (response.getStatusLine().getStatusCode() != 201) {
throw new ISMException(
"Failed to create ISM policy: " + response.getStatusLine());
}
}
public String getPolicyStatus(String indexName) throws IOException {
Request request = new Request("GET",
"/_plugins/_ism/explain/" + indexName);
Response response = restClient.performRequest(request);
return EntityUtils.toString(response.getEntity());
}
public void retryFailedPolicy(String indexName) throws IOException {
Request request = new Request("POST",
"/_plugins/_ism/retry/" + indexName);
request.setJsonEntity("{\"state\": \"hot\"}");
restClient.performRequest(request);
}
}
Tenant-Sized Rollover Policies
// HARDENED: Assign rollover policies based on tenant write volume
public String buildPolicyForTier(String tier) {
return switch (tier) {
case "high-volume" -> """
{
"policy": {
"description": "High-volume tenant lifecycle",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_doc_count": 500000,
"min_size": "40gb",
"min_index_age": "7d"
}
}
],
"transitions": [
{"state_name": "warm", "conditions": {"min_index_age": "14d"}}
]
}
]
}
}
""";
case "low-volume" -> """
{
"policy": {
"description": "Low-volume tenant lifecycle",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_index_age": "90d"
}
}
],
"transitions": [
{"state_name": "warm", "conditions": {"min_index_age": "180d"}}
]
}
]
}
}
""";
default -> throw new IllegalArgumentException("Unknown tier: " + tier);
};
}
ISM Health Monitor
public record ISMStatus(
String indexName,
String currentState,
String failedReason,
long retryCount
) {}
public List<ISMStatus> getUnhealthyPolicies() throws IOException {
Request request = new Request("GET", "/_plugins/_ism/explain/*");
Response response = restClient.performRequest(request);
// Parse response and filter for failed or stuck policies
JsonNode root = objectMapper.readTree(
EntityUtils.toString(response.getEntity()));
List<ISMStatus> unhealthy = new ArrayList<>();
root.fields().forEachRemaining(entry -> {
String indexName = entry.getKey();
JsonNode status = entry.getValue();
if (status.has("info") &&
status.get("info").has("message") &&
status.get("info").get("message").asText().contains("failed")) {
unhealthy.add(new ISMStatus(
indexName,
status.path("state").path("name").asText(),
status.path("info").path("message").asText(),
status.path("retry_info").path("failed").asLong()
));
}
});
return unhealthy;
}
The Measurement
Shard size distribution after 90 days with uniform vs tenant-tier policies:
| Policy Type | p10 Shard Size | p50 Shard Size | p90 Shard Size |
|---|---|---|---|
| Uniform (30d rollover) | 180MB | 12GB | 85GB |
| Tenant-tiered | 2GB | 18GB | 42GB |
Tenant-tiered policies produce shard sizes within the 5-50GB target range. Uniform policies create both undersized shards (small tenants rolled over too early) and oversized shards (large tenants rolled over too late).
The Decision Rule
Classify tenants into write-volume tiers (high, medium, low) and assign corresponding ISM policies. The classification can be automated based on the trailing 30-day write rate.
Monitor ISM policy execution daily. Failed policies leave indices stranded in the wrong tier, consuming hot-tier resources for data that should be on warm hardware. The _plugins/_ism/explain endpoint is the primary diagnostic tool.
Set rollover conditions to achieve shard sizes between 10GB and 40GB. Shards below 5GB waste overhead. Shards above 50GB slow recovery and rebalancing operations.