From Transformers to Associative Memory, How Titans and MIRAS Rethink Long Context Modeling
These articles are AI-generated summaries. Please check the original sources for full details.
Titans and MIRAS: Rethinking Long Context Modeling
Google Research is proposing Titans and MIRAS, novel approaches to equipping sequence models with usable long-term memory while maintaining parallel training and near-linear inference speeds. Titans is a concrete architecture adding a deep neural memory to a Transformer, while MIRAS is a general framework viewing sequence models as online optimization over associative memory.
Standard Transformers struggle with long sequences due to the quadratic growth of computational cost with context length, limiting practical applications despite optimizations like FlashAttention. Titans and MIRAS aim to bridge the gap between efficient linear models (like Mamba-2) that compress history and the strong in-context learning of Transformers, which can lose information in very long sequences.
Key Insights
- Quadratic Scaling Problem: Transformers’ attention mechanism scales quadratically with context length, hindering performance on long sequences.
- Associative Memory Framework: MIRAS reframes sequence models as associative memories, defined by memory structure, attentional bias, retention, and optimization algorithms.
- Test-Time Learning: Titans utilizes a neural long-term memory module that learns at test time via gradient descent, selectively storing surprising tokens.
Working Example
# Simplified illustration of Titans associative memory loss
import torch
def associative_memory_loss(memory, key, value):
"""
Calculates the L2 loss between the memory's output for the key and the value.
"""
return torch.norm(memory(key) - value)**2
# Example usage
memory = torch.nn.Linear(10, 5) # Simple linear memory
key = torch.randn(1, 10)
value = torch.randn(1, 5)
loss = associative_memory_loss(memory, key, value)
print(f"Associative Memory Loss: {loss.item()}")
Practical Applications
- Genomic Modeling: Titans and MIRAS can process long DNA sequences for improved gene prediction and analysis.
- Long-Form Document Understanding: Systems can analyze extensive legal documents or research papers without losing critical information.
References:
Continue reading
Next article
AI News Weekly Summary: Feb 09 - Dec 07, 2025
Related Content
Sigmoid vs ReLU: Why Geometric Context Preservation is Critical for Neural Network Inference
ReLU outperforms Sigmoid by preserving geometric distance from decision boundaries, achieving 96% accuracy compared to Sigmoid's 79% in two-moons benchmarks.
Implementing Prompt Compression to Reduce Agentic Loop Costs
Learn how prompt compression reduces the quadratic token costs of agentic AI loops by up to 67% using techniques like recursive summarization and instruction distillation.
A Coding Implementation on Building Self-Organizing Zettelkasten Knowledge Graphs and Sleep-Consolidation Mechanisms
This tutorial demonstrates building a “Zettelkasten” memory system using Gemini, achieving dynamic knowledge graph organization and sleep-based memory consolidation.