Replaying Production AI Agent Streams with AgentStreamRecorder

I can now replay any AI agent stream from production. Here’s how.

Abhishek Chatterjee released AgentStreamRecorder to the agent-stream library to solve the evaporating event sequence problem in AI agent UIs. The system captures stateful Server-Sent Events (SSE) into .jsonl files for post-mortem debugging and local reproduction.

Why This Matters

Unlike REST APIs where requests and responses are easily logged and replayed via curl, AI agent streams are stateful and ephemeral, often failing due to specific event sequences or network conditions impossible to replicate in local development. When a frontend UI freezes or tools fail to clear, developers often lack the specific sequence of tool calls and token rates needed to diagnose the root cause, leading to recurring production bugs that standard observability stacks miss.

Key Insights

Stream-native observability requires session-based tracing rather than individual request logs, as demonstrated by Praxiom’s development of 36 production agent tools.
The Relative Timestamp concept (t field in .jsonl) enables portable debugging by measuring seconds since stream start rather than absolute wall-clock time.
The agent-stream CLI tool enables 0.1x to 2x speed replays of production SSE events to identify race conditions in frontend state management.
Non-buffered file flushing ensures that even if a process crashes mid-stream, all events up to the failure point are preserved for analysis.
Binary formats are avoided in favor of .jsonl to allow developers to use standard tools like grep for instant error event filtering.

Working Examples

Adding the AgentStreamRecorder async wrapper to an existing FastAPI endpoint.

from agent_stream.recorder import AgentStreamRecorder

recorder = AgentStreamRecorder("streams/production.jsonl")

@app.post("/chat")
async def chat(req: ChatRequest):
    async def generate():
        async for sse_str in recorder.record(run_agent(req.message)):
            yield sse_str # passes through unchanged
    return agent_stream_response(generate())

Practical Applications

Use Case: Debugging React hook state where a tool_result arrives before a tool_use due to parallel execution; Pitfall: Assuming sequential event delivery in frontend state machines.
Use Case: Identifying UI truncation by replaying production .jsonl files through a local dev server; Pitfall: Attempting to reproduce streaming bugs in local environments without real network token rates.
Use Case: Building regression test suites from real production recordings to verify fixes for hanging UI states; Pitfall: Relying on generic error logs that lack the full event sequence context.

References:

On This Page

I can now replay any AI agent stream from production. Here’s how.

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Secure AI Agent Code Execution: Replacing Fragile Docker Wrappers with Roche

Engineering LLM Reliability: 6 Lessons from AI Testing and Production

The Engineering Limits of Vibe Coding: When LLM Iteration Fails