Skip to main content

On This Page

Scaling Multi-Agent Systems: Lessons from Intuit on Orchestration and Predictability

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to get multiple agents to play nice at scale

Chase Roossin and Steven Kulesza from Intuit address the engineering challenge of orchestrating multiple AI agents within complex systems. They highlight that automated evaluations are critical for making agent behaviors predictable at scale. This approach allows developers to manage the inherent volatility of LLM-based interactions in production.

Why This Matters

While ideal models suggest seamless AI collaboration, technical reality requires managing unpredictable agent interactions in production environments. Scaling these systems necessitates a move away from manual testing toward automated evaluation frameworks to maintain system reliability. Engineering teams must navigate the trade-offs between deploying agent swarms versus single, highly skilled agents. This decision-making process is heavily influenced by customer behavior and the need for reusable AI components across diverse development teams to ensure consistency and speed.

Key Insights

  • Automated evaluations are used by Intuit in 2026 to ensure agent behaviors remain predictable as system complexity increases.
  • Agent swarms represent a decentralized architecture alternative to a single highly skilled agent for complex task execution.
  • Technical architecture at Intuit is shaped by customer behavior data to ensure AI agents meet specific user requirements.
  • Reusability is leveraged to democratize AI development across various teams, according to Intuit engineering leadership.
  • The implementation of automated eval pipelines is essential for achieving predictability in agent-based systems.
  • Scaling multi-agent systems is currently considered one of the hardest problems in engineering.

Practical Applications

  • Use Case: Intuit integrates automated evals to stabilize agent interactions in production environments.
  • Pitfall: Scaling agent systems without automated evaluation metrics leads to unpredictable and non-deterministic software behavior.
  • Use Case: Deploying agent swarms to distribute specialized tasks across multiple smaller models for better performance.
  • Pitfall: Designing agent architectures in isolation from customer behavior data results in misaligned system outputs.

References:

Continue reading

Next article

Building a High-Performance Static Photo Gallery with Go, SvelteKit, and Claude Code

Related Content