Skip to main content

On This Page

Designing Production-Grade Multi-Agent Systems with LangGraph and ACP Message Bus

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to Design a Production-Grade Multi-Agent Communication System Using LangGraph Structured Message Bus, ACP Logging, and Persistent Shared State Architecture

This architecture implements a structured message bus using LangGraph and Pydantic to manage communication between Planner, Executor, and Validator agents. It utilizes a strict ACP-style message schema to ensure every interaction is logged and traceable through a centralized shared state.

Why This Matters

Traditional multi-agent systems often suffer from brittle, direct agent-to-agent calls that lack observability and fail to persist state across interruptions. By moving to a message bus architecture, developers achieve modularity and durability, using SQLite-based persistence to ensure agents can recover from failures without losing the execution context.

Key Insights

  • The ACP-style message schema (2026) uses Pydantic to enforce strict metadata, including unique message IDs and UTC timestamps for every interaction.
  • Shared state architecture, exemplified by the BusState class, replaces direct method calls with a mailbox system to decouple agent logic from orchestration.
  • LangGraph’s SqliteSaver provides production-grade persistence, allowing the system to save and recover state using unique thread identifiers.
  • Structured logging via ACP logs enables real-time observability, recording sender, receiver, and message types in a JSONL format for auditability.
  • Deterministic routing logic in StateGraph allows for dynamic flow control, enabling the system to loop back to planners or validators based on execution results.

Working Examples

Definition of the ACP-style message schema and centralized BusState for multi-agent coordination.

class ACPMessage(BaseModel):
    msg_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    ts: str = Field(default_factory=lambda: datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"))
    sender: Role
    receiver: Role
    msg_type: MsgType
    content: str
    meta: Dict[str, Any] = Field(default_factory=dict)
    trace: Dict[str, Any] = Field(default_factory=dict)

class BusState(BaseModel):
    goal: str = ""
    done: bool = False
    mailbox: List[ACPMessage] = Field(default_factory=list)
    active_role: Role = "user"
    step: int = 0

Constructing the LangGraph state graph with SQLite persistence for durable execution.

graph = StateGraph(dict)
graph.add_node("planner", planner_agent)
graph.add_node("executor", executor_agent)
graph.add_node("validator", validator_agent)
graph.set_entry_point("planner")

checkpointer = SqliteSaver(sqlite3.connect("checkpoints/langgraph_bus.sqlite", check_same_thread=False))
app = graph.compile(checkpointer=checkpointer)

Practical Applications

  • Use Case: Orchestrating complex software engineering tasks where a Planner defines steps and a Validator checks JSON output against a schema.
  • Pitfall: Direct agent coupling without a shared state, which leads to untraceable execution paths and difficulty in debugging state transitions.
  • Use Case: Persistent long-running workflows using SQLite checkpoints to maintain agent progress across system restarts or session timeouts.
  • Pitfall: Inconsistent message schemas in multi-agent environments, resulting in downstream validation failures and unhandled runtime errors.

References:

Continue reading

Next article

Cloud Data Egress Cost Analysis: Comparing 44 Providers

Related Content