Scaling Multi-Agent Systems: Lessons from Intuit on Orchestration and Predictability
These articles are AI-generated summaries. Please check the original sources for full details.
How to get multiple agents to play nice at scale
Chase Roossin and Steven Kulesza from Intuit address the engineering challenge of orchestrating multiple AI agents within complex systems. They highlight that automated evaluations are critical for making agent behaviors predictable at scale. This approach allows developers to manage the inherent volatility of LLM-based interactions in production.
Why This Matters
While ideal models suggest seamless AI collaboration, technical reality requires managing unpredictable agent interactions in production environments. Scaling these systems necessitates a move away from manual testing toward automated evaluation frameworks to maintain system reliability. Engineering teams must navigate the trade-offs between deploying agent swarms versus single, highly skilled agents. This decision-making process is heavily influenced by customer behavior and the need for reusable AI components across diverse development teams to ensure consistency and speed.
Key Insights
- Automated evaluations are used by Intuit in 2026 to ensure agent behaviors remain predictable as system complexity increases.
- Agent swarms represent a decentralized architecture alternative to a single highly skilled agent for complex task execution.
- Technical architecture at Intuit is shaped by customer behavior data to ensure AI agents meet specific user requirements.
- Reusability is leveraged to democratize AI development across various teams, according to Intuit engineering leadership.
- The implementation of automated eval pipelines is essential for achieving predictability in agent-based systems.
- Scaling multi-agent systems is currently considered one of the hardest problems in engineering.
Practical Applications
- Use Case: Intuit integrates automated evals to stabilize agent interactions in production environments.
- Pitfall: Scaling agent systems without automated evaluation metrics leads to unpredictable and non-deterministic software behavior.
- Use Case: Deploying agent swarms to distribute specialized tasks across multiple smaller models for better performance.
- Pitfall: Designing agent architectures in isolation from customer behavior data results in misaligned system outputs.
References:
Continue reading
Next article
Building a High-Performance Static Photo Gallery with Go, SvelteKit, and Claude Code
Related Content
Multilingual AI Engineering: Lessons from Building k4pi for Telegram
Developer David shares technical hurdles in scaling k4pi to four languages, using morphological analyzers and vector search to serve 950 million Telegram users.
AI Coding Agents: A Week of Real-World Engineering Data
Engineer Emily Woods reports a 40% increase in raw line output using AI agents, though production-ready code volume remained stagnant.
Bridging the Gap Between AI-Assisted Speed and System Stability
AI tools boost code production speed, but exceeding a system's change absorption capacity leads to production failures and triple the rework time.