BerriAI Launches LiteLLM Agent Platform for Kubernetes-Based Production AI Infrastructure
These articles are AI-generated summaries. Please check the original sources for full details.
Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production
BerriAI has open-sourced the LiteLLM Agent Platform to provide a self-hosted infrastructure layer for running multiple AI agents in production. The system leverages the kubernetes-sigs/agent-sandbox CRD to create isolated runtime environments per session. It specifically addresses the challenge of maintaining session state and tool results across pod restarts and deployments.
Why This Matters
Running agents in production is fundamentally different from local scripts because agents are inherently stateful, carrying session history and reasoning across multiple turns. In standard containerized environments, if a pod crashes or is replaced during a deployment, the entire session state is purged unless an external infrastructure layer explicitly manages persistence. The LiteLLM Agent Platform addresses this by decoupling the agent execution environment from the management layer. By using Kubernetes-based sandboxes and a persistent Postgres backing store, it ensures that stateful work is preserved while providing strict isolation between different teams’ tools, secrets, and access scopes.
Key Insights
- The platform uses the kubernetes-sigs/agent-sandbox CRD (2026) to manage the lifecycle of individual agent environments as native Kubernetes resources.
- Session continuity is maintained across container restarts using a Postgres persistent store and a dedicated worker process for async tasks.
- The infrastructure supports 100+ LLM providers via the LiteLLM AI Gateway, including AWS Bedrock, Azure, and VertexAI.
- Local development is facilitated through kind (Kubernetes in Docker), allowing engineers to test sandbox isolation without cloud credentials.
- Secrets are securely injected into sandboxes using a CONTAINER_ENV_ prefixing system that strips the prefix before passing variables to the runtime.
Working Examples
Local quickstart to provision a kind cluster and start the web and worker services.
bin/kind-up.sh && docker compose up
Programmatic creation of an agent session via the platform’s REST API.
curl -X POST http://localhost:3000/api/sessions -H "Content-Type: application/json" -d '{"agent_id": "your-agent-id"}'
Practical Applications
- Use Case: Teams deploying coding agents like Claude Code or OpenAI Codex can use the opencode harness to run agents in isolated VMs with credential proxying.
- Pitfall: Using a shared runtime for different agent teams leads to cross-contamination of secrets; per-context sandboxes eliminate this risk.
- Use Case: Production deployments on AWS EKS allow horizontal scaling of agent sandboxes while maintaining a central management dashboard on Render.
- Pitfall: Manual session handling in application code causes state loss during routine deployments; the platform’s session persistence solves this.
References:
Continue reading
Next article
Node.js Lifecycle Guide: Managing EOL Risks from Version 14 to 24
Related Content
CopilotKit Introduces Enterprise Intelligence Platform for Persistent Agentic Memory
CopilotKit launches the Enterprise Intelligence Platform to provide agentic applications with persistent memory and state across sessions and devices.
TinyFish AI Launches Unified Web Infrastructure for AI Agents
TinyFish AI launches a unified web infrastructure platform for AI agents, reducing token consumption by 87% and improving task completion rates by 2x.
GitAgent: A Universal Open-Source Format for Framework-Agnostic AI Agents
GitAgent introduces an open-source CLI tool to decouple AI agent logic from frameworks like LangChain and AutoGen using a Git-native architecture for better portability.