Skip to main content
search at depth

Mapping Versioning and Migration Strategy

5 min read Chapter 15 of 60

Mapping Versioning and Migration Strategy

The Symptom

The documentation platform needs to add a difficulty_level keyword field to the mapping. The developer runs PUT docs-v1/_mapping with the new field, which succeeds. The next sprint, the team needs to change the body field’s analyzer from standard to the custom code_analyzer. The mapping update API rejects this change. The field’s analyzer is immutable. The only path forward is a full reindex into a new index with the correct mapping.

The team has no process for this. They create docs-v2 manually, reindex with the _reindex API, update all application code to point to docs-v2, and spend the next four hours debugging why the reindex missed 12,000 documents.

The Internals

OpenSearch index mappings are append-only for additive changes and immutable for destructive changes. The mapping API allows adding new fields but not modifying existing ones. This is a Lucene constraint: the inverted index, doc values, and stored fields for existing documents were built with the original field configuration. Changing the configuration retroactively would require rewriting every segment.

A mapping versioning strategy treats this constraint as a feature, not a limitation. By naming indices with version suffixes and accessing them through aliases, mapping changes become deployment operations rather than emergencies.

The Implementation

Index Naming Convention

docs-v1    → initial mapping
docs-v2    → added code_analyzer, changed body field analysis
docs-v3    → added vector field for semantic search

Alias-Based Access

// HARDENED: Application code always uses the alias, never the index name
// Index swaps are transparent to the application.

private static final String DOCS_READ_ALIAS = "docs-read";
private static final String DOCS_WRITE_ALIAS = "docs-write";

// Initial setup: create index and aliases
client.indices().create(c -> c.index("docs-v1"));

client.indices().updateAliases(a -> a
    .actions(act -> act.add(add -> add
        .index("docs-v1")
        .alias(DOCS_READ_ALIAS)
    ))
    .actions(act -> act.add(add -> add
        .index("docs-v1")
        .alias(DOCS_WRITE_ALIAS)
    ))
);

// All search operations use the read alias
SearchRequest searchRequest = SearchRequest.of(s -> s
    .index(DOCS_READ_ALIAS)
    .query(q -> q.match(m -> m.field("body").query(userQuery)))
);

// All index operations use the write alias
IndexRequest<DocPage> indexRequest = IndexRequest.of(r -> r
    .index(DOCS_WRITE_ALIAS)
    .id(docId)
    .document(page)
);

Mapping Migration Test

@Test
void mappingMigrationPreservesSearchBehavior() throws Exception {
    // Create v1 index with original mapping
    createDocsV1Index(client);

    // Index representative documents
    indexTestDocuments(client, "docs-v1", testDocuments());

    // Run representative queries and capture results
    List<SearchResult> v1Results = runQueryTestSet(client, "docs-v1");

    // Create v2 index with new mapping
    createDocsV2Index(client);

    // Reindex from v1 to v2
    client.reindex(r -> r
        .source(src -> src.index("docs-v1"))
        .dest(dst -> dst.index("docs-v2"))
    );
    client.indices().refresh(r -> r.index("docs-v2"));

    // Verify document count matches
    long v1Count = client.count(c -> c.index("docs-v1")).count();
    long v2Count = client.count(c -> c.index("docs-v2")).count();
    assertThat(v2Count).isEqualTo(v1Count);

    // Run the same queries against v2 and compare results
    List<SearchResult> v2Results = runQueryTestSet(client, "docs-v2");

    // Compare result sets (ordering may change with analyzer changes)
    for (int i = 0; i < v1Results.size(); i++) {
        assertThat(v2Results.get(i).documentIds())
            .as("Query '%s' result set changed", v1Results.get(i).query())
            .containsAll(v1Results.get(i).documentIds());
    }
}

This test catches two categories of migration failures: documents lost during reindex (count mismatch) and search behavior regressions (result set changes). Running it in CI before deploying a mapping change prevents the manual debugging session that follows a botched reindex.

Mapping Compatibility Check

public class MappingValidator {

    /**
     * Validate that a new mapping is compatible with the existing one.
     * Returns a list of incompatible changes that require reindexing.
     */
    public List<String> validateCompatibility(
            Map<String, Property> currentMapping,
            Map<String, Property> newMapping) {

        List<String> incompatibilities = new ArrayList<>();

        for (Map.Entry<String, Property> entry : newMapping.entrySet()) {
            String fieldName = entry.getKey();
            Property newProp = entry.getValue();

            if (!currentMapping.containsKey(fieldName)) {
                continue;  // New field, always compatible
            }

            Property currentProp = currentMapping.get(fieldName);

            if (!currentProp._kind().equals(newProp._kind())) {
                incompatibilities.add(
                    "Field '%s': type change from %s to %s requires reindex"
                        .formatted(fieldName, currentProp._kind(), newProp._kind())
                );
            }
        }

        for (String fieldName : currentMapping.keySet()) {
            if (!newMapping.containsKey(fieldName)) {
                incompatibilities.add(
                    "Field '%s': removal requires reindex (cannot remove fields from mapping)"
                        .formatted(fieldName)
                );
            }
        }

        return incompatibilities;
    }
}

The Measurement

Track mapping versions and migration history:

VersionChangeReindex RequiredMigration Time (1M docs)Downtime
v1Initial mappingN/AN/AN/A
v2Added difficulty_level keywordNo0 (PUT _mapping)0
v3Changed body analyzerYes12 minutes0 (alias swap)
v4Added embedding knn_vectorYes45 minutes (includes embedding generation)0 (alias swap)

The reindex time is proportional to document count and inversely proportional to bulk indexing throughput. With the alias-based access pattern, downtime is zero: the alias swap is atomic.

The Decision Rule

Always access indices through aliases. Never reference index names directly in application code. This makes reindexing a deployment operation, not a code change.

Version-suffix index names. When a mapping change requires reindexing, create the new versioned index, reindex into it, verify with the migration test, and atomically swap the alias. The old index can be deleted after verification.

Test mapping migrations with Testcontainers in CI. The test should verify document count preservation and search behavior stability. A mapping migration that changes search results should be intentional and measured with the relevance evaluation framework (chapter 9), not discovered in production.