BerriAI Launches LiteLLM Agent Platform for Kubernetes-Based Production AI Infrastructure

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

BerriAI has open-sourced the LiteLLM Agent Platform to provide a self-hosted infrastructure layer for running multiple AI agents in production. The system leverages the kubernetes-sigs/agent-sandbox CRD to create isolated runtime environments per session. It specifically addresses the challenge of maintaining session state and tool results across pod restarts and deployments.

Why This Matters

Running agents in production is fundamentally different from local scripts because agents are inherently stateful, carrying session history and reasoning across multiple turns. In standard containerized environments, if a pod crashes or is replaced during a deployment, the entire session state is purged unless an external infrastructure layer explicitly manages persistence. The LiteLLM Agent Platform addresses this by decoupling the agent execution environment from the management layer. By using Kubernetes-based sandboxes and a persistent Postgres backing store, it ensures that stateful work is preserved while providing strict isolation between different teams’ tools, secrets, and access scopes.

Key Insights

The platform uses the kubernetes-sigs/agent-sandbox CRD (2026) to manage the lifecycle of individual agent environments as native Kubernetes resources.
Session continuity is maintained across container restarts using a Postgres persistent store and a dedicated worker process for async tasks.
The infrastructure supports 100+ LLM providers via the LiteLLM AI Gateway, including AWS Bedrock, Azure, and VertexAI.
Local development is facilitated through kind (Kubernetes in Docker), allowing engineers to test sandbox isolation without cloud credentials.
Secrets are securely injected into sandboxes using a CONTAINER_ENV_ prefixing system that strips the prefix before passing variables to the runtime.

Working Examples

Local quickstart to provision a kind cluster and start the web and worker services.

bin/kind-up.sh && docker compose up

Programmatic creation of an agent session via the platform’s REST API.

curl -X POST http://localhost:3000/api/sessions -H "Content-Type: application/json" -d '{"agent_id": "your-agent-id"}'

Practical Applications

Use Case: Teams deploying coding agents like Claude Code or OpenAI Codex can use the opencode harness to run agents in isolated VMs with credential proxying.
Pitfall: Using a shared runtime for different agent teams leads to cross-contamination of secrets; per-context sandboxes eliminate this risk.
Use Case: Production deployments on AWS EKS allow horizontal scaling of agent sandboxes while maintaining a central management dashboard on Render.
Pitfall: Manual session handling in application code causes state loss during routine deployments; the platform’s session persistence solves this.

References:

On This Page

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

TinyFish AI Launches Unified Web Infrastructure for AI Agents

CopilotKit Introduces Enterprise Intelligence Platform for Persistent Agentic Memory

GitAgent: A Universal Open-Source Format for Framework-Agnostic AI Agents