Zep's Temporal KG Memory Hits 94.8% Accuracy on DMR, Outperforming Vector RAG
These articles are AI-generated summaries. Please check the original sources for full details.
Comparing Memory Systems for LLM Agents: Vector, Graph, and Event Logs
Zep’s temporal knowledge graph memory system achieved 94.8% accuracy on the Deep Memory Retrieval (DMR) benchmark in 2025, outperforming vector-based RAG systems. This highlights the critical role of structured memory in multi-agent workflows.
Why This Matters
Vector memory systems, while fast for sublinear retrieval, struggle with temporal and relational tasks. Benchmarks like DMR show they degrade on long-horizon queries, leading to lost constraints, semantic drift, and context dilution in multi-agent planning. The cost of these failures can include invalid tool calls, compliance violations, or system instability. Graph memory systems, by contrast, encode explicit temporal and relational structures, improving accuracy and reducing latency by up to 90% in complex scenarios.
Key Insights
- “94.8% accuracy on DMR, 2025”: Zep/Graphiti (MarkTechPost, 2025)
- “Temporal KG over vector RAG for multi-agent planning”: Zep’s architecture enables cross-session consistency and multi-hop reasoning
- “ALAS used by multi-agent systems for execution logs”: Transactional logging ensures replayability and localized repair
Practical Applications
- Use Case: Zep used in multi-agent systems for cross-session consistency (e.g., tracking user requests across time)
- Pitfall: Vector RAG’s semantic drift in temporal queries (e.g., misaligned region/environment IDs in retrieved chunks)
Continue reading
Next article
LLM Evaluation Metrics: Key Metrics, Benchmarks, and Tools for Developers
Related Content
How to Design an Advanced Multi-Agent Reasoning System with spaCy Featuring Planning, Reflection, Memory, and Knowledge Graphs
Build a multi-agent AI system with spaCy that extracts entities, constructs knowledge graphs, and learns from experience using reflection and memory modules.
Lux Surpasses Google Gemini CUA with 83.6% Accuracy on Online Mind2Web Benchmark
Lux, a new foundation computer use model by OpenAGI, achieves 83.6% accuracy on Online Mind2Web, outperforming Google Gemini CUA and others.
OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows
OpenAI's GPT-5.1-Codex-Max achieves 77.9% accuracy on SWE-bench Verified with compaction, enabling 24-hour autonomous coding sessions.