Apache Iceberg v4: Redesigning Metadata for Streaming and AI Workloads

Apache Iceberg v4: The Current State, the Proposals, and Why They Matter

The Apache Iceberg community gathered at the Iceberg Summit 2026 in San Francisco. Over 70 sessions focused on spec changes to address operational pain points for users running Iceberg in production at scale.

Why This Matters

Iceberg’s original design optimized for large, slow-moving analytical tables, but modern streaming pipelines committing every few seconds create fatal write amplification. Under v3, even tiny writes trigger multiple metadata file creations (metadata.json, manifest lists, manifests), leading to object storage throttling and high commit latency that renders batch-oriented metadata structures inefficient for real-time AI and streaming workloads.

Key Insights

Adaptive Metadata Trees (v4 Proposal): Implements a Root Manifest to replace manifest lists, allowing small writes to be inlined for low latency—essential for Flink jobs committing every five seconds.
Columnar Metadata Transition: Moves metadata from Avro (row-based) to Parquet (columnar), enabling engines to prune metadata columns during query planning rather than deserializing entire records.
Typed Column Statistics: Replaces generic maps with structured representations of stats to support extensible metrics, specifically opening the door for approximate nearest neighbor search in vector databases.
Relocatable Tables: Introduces relative paths instead of absolute URIs, eliminating the need for expensive metadata rewrites when replicating tables across regions or buckets.
Convergence Proposal: Databricks proposed that Delta Lake 5.0 adopt the Iceberg v4 adaptive metadata tree as its native foundation to eliminate translation layers like UniForm.

Working Examples

Proposed restructured metadata tree hierarchy for v4.

Root Manifest -> Data Manifests / Delete Manifests / Files

Practical Applications

.
}, { “use_case”: “AI Feature Tables: Using column families to update a small subset of features without rewriting all 200+ columns in a wide table.”, “pitfall”: “Full row rewrites in wide tables: Touching 5% of data while rewriting 100% of files leads to prohibitive cloud storage costs.” }, { “use_case”: “Disaster Recovery: Moving table roots between regions using relative paths to maintain internal file relationships without rewriting metadata.”, “pitfall”: “Absolute URI referencing: Hardcoding bucket/region paths makes replication a slow project rather than a routine operation.” } ] , “references”: [ “https://dev.to/alexmercedcoder/apache-iceberg-v4-the-current-state-the-proposals-and-why-they-matter-3e07” ] }

References:

https://dev.to/alexmercedcoder/apache-iceberg-v4-the-current-state-the-proposals-and-why-they-matter-3e07

On This Page

Apache Iceberg v4: The Current State, the Proposals, and Why They Matter

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Live Sports Highlights Demand Real-Time AI Architecture, Not Batch Pipelines

Convert API Data to SQLite: Using surveilr and Singer Taps for Cross-Platform Analysis

7 C# Techniques That Slash Code and Cut Cloud Costs: Expert Habits for 2026