Designing Production-Grade Multi-Agent Systems with LangGraph and ACP Message Bus

How to Design a Production-Grade Multi-Agent Communication System Using LangGraph Structured Message Bus, ACP Logging, and Persistent Shared State Architecture

This architecture implements a structured message bus using LangGraph and Pydantic to manage communication between Planner, Executor, and Validator agents. It utilizes a strict ACP-style message schema to ensure every interaction is logged and traceable through a centralized shared state.

Why This Matters

Traditional multi-agent systems often suffer from brittle, direct agent-to-agent calls that lack observability and fail to persist state across interruptions. By moving to a message bus architecture, developers achieve modularity and durability, using SQLite-based persistence to ensure agents can recover from failures without losing the execution context.

Key Insights

The ACP-style message schema (2026) uses Pydantic to enforce strict metadata, including unique message IDs and UTC timestamps for every interaction.
Shared state architecture, exemplified by the BusState class, replaces direct method calls with a mailbox system to decouple agent logic from orchestration.
LangGraph’s SqliteSaver provides production-grade persistence, allowing the system to save and recover state using unique thread identifiers.
Structured logging via ACP logs enables real-time observability, recording sender, receiver, and message types in a JSONL format for auditability.
Deterministic routing logic in StateGraph allows for dynamic flow control, enabling the system to loop back to planners or validators based on execution results.

Working Examples

Definition of the ACP-style message schema and centralized BusState for multi-agent coordination.

class ACPMessage(BaseModel):
    msg_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    ts: str = Field(default_factory=lambda: datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"))
    sender: Role
    receiver: Role
    msg_type: MsgType
    content: str
    meta: Dict[str, Any] = Field(default_factory=dict)
    trace: Dict[str, Any] = Field(default_factory=dict)

class BusState(BaseModel):
    goal: str = ""
    done: bool = False
    mailbox: List[ACPMessage] = Field(default_factory=list)
    active_role: Role = "user"
    step: int = 0

Constructing the LangGraph state graph with SQLite persistence for durable execution.

graph = StateGraph(dict)
graph.add_node("planner", planner_agent)
graph.add_node("executor", executor_agent)
graph.add_node("validator", validator_agent)
graph.set_entry_point("planner")

checkpointer = SqliteSaver(sqlite3.connect("checkpoints/langgraph_bus.sqlite", check_same_thread=False))
app = graph.compile(checkpointer=checkpointer)

Practical Applications

Use Case: Orchestrating complex software engineering tasks where a Planner defines steps and a Validator checks JSON output against a schema.
Pitfall: Direct agent coupling without a shared state, which leads to untraceable execution paths and difficulty in debugging state transitions.
Use Case: Persistent long-running workflows using SQLite checkpoints to maintain agent progress across system restarts or session timeouts.
Pitfall: Inconsistent message schemas in multi-agent environments, resulting in downstream validation failures and unhandled runtime errors.

References:

https://www.marktechpost.com/2026/03/01/how-to-design-a-production-grade-multi-agent-communication-system-using-langgraph-structured-message-bus-acp-logging-and-persistent-shared-state-architecture/

On This Page

How to Design a Production-Grade Multi-Agent Communication System Using LangGraph Structured Message Bus, ACP Logging, and Persistent Shared State Architecture

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Building Multi-Agent Systems with SmolAgents: Code Execution and Dynamic Orchestration

Designing an Autonomous Multi-Agent Data Infrastructure System with Lightweight Qwen Models

Building Multi-Agent Data Analysis Pipelines with Google ADK