llmwiki: Solving the LLM Context-Switch Tax with Persistent Project Memory

I Got Tired of Re-explaining My Codebase to Claude Code Every Session. So I Built llmwiki.

Engineer Max Małecki developed llmwiki to automate the creation of persistent, version-controlled project documentation for AI coding assistants. The tool leverages a materialize command to rebuild context from accumulated facts, cutting token usage from 100K down to 15K per session.

Why This Matters

LLM coding assistants like Claude Code often lack long-term memory, forcing developers to expend high token counts or manual effort re-explaining architecture in every new session. llmwiki addresses this technical reality by decoupling project knowledge from ephemeral chat contexts, providing a structured Markdown layer that syncs via Git and integrates with tools like Obsidian for visual mapping. This approach mitigates the cognitive tax of context-switching between projects while significantly lowering the operational cost of using high-end models like Opus 4.7.

Key Insights

Token cost reduction of ~85% for project context (15K vs 100K tokens) using the materialize command in 2026.
Persistent Markdown wiki pattern popularized by Andrej Karpathy to maintain project knowledge across session boundaries.
Integration of Graymatter memory layer for semantic search and fact decay with a 30-day half-life to prune stale information.
Support for local-first execution using Ollama backend to ensure data privacy and security for air-gapped environments.
Automated documentation generation including Mermaid diagrams, C4 system landscapes, and OpenAPI-extracted API docs.

Working Examples

Scans the project to generate a markdown wiki covering architecture, service maps, and API docs.

llmwiki ingest ~/workspace/my-api

Rebuilds the wiki from accumulated facts, reducing token usage compared to a full ingest.

llmwiki materialize my-project

Injects the project map into a CLAUDE.md file using marker blocks.

llmwiki context my-project --inject CLAUDE.md

Marker blocks used for automated context injection in documentation files.

<!-- llmwiki:start -->
... domain, architecture, services, flows ...
<!-- llmwiki:end -->

Practical Applications

Use case: Tech leads onboarding junior engineers use auto-generated C4 system landscape diagrams to visualize service relationships across multiple repos. Pitfall: Relying on manually updated Confluence pages which are often outdated by years.
Use case: Consultants managing multiple client stacks utilize the Graymatter layer for semantic search across project-specific facts. Pitfall: Manually re-explaining architecture to AI assistants every morning, leading to wasted billable hours and high token invoices.
Use case: Security-sensitive environments run llmwiki with an Ollama backend for air-gapped codebase analysis. Pitfall: Sending proprietary source code to cloud-based LLM providers without proper path-traversal or data-leakage safeguards.

References:

On This Page

I Got Tired of Re-explaining My Codebase to Claude Code Every Session. So I Built llmwiki.

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

AI Hardware Stack Rebuilt from Wafer Up: Cerebras WSE-3 Beats B200 by 21x, OpenAI Bets $20B+

From Transformers to Associative Memory, How Titans and MIRAS Rethink Long Context Modeling

Taalas Hardwired Chips: Achieving 17,000 Tokens/Sec via Direct-to-Silicon Inference