Skip to main content

On This Page

KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM

Robert Nishihara from Anyscale presented at KubeCon + CloudNativeCon North America 2025, detailing how Kubernetes, Ray, PyTorch, and vLLM address complex AI workloads. His talk emphasized the shift from CPU-based SQL operations to GPU-driven inference and training.

Why This Matters

AI workloads now require handling multimodal data and GPU acceleration, yet traditional systems struggle with distributed training and inference scaling. Nishihara highlighted that 85% of AI applications face bottlenecks in data movement between GPUs and CPUs, costing enterprises up to 30% in compute inefficiency. Ray’s RDMA support and Kubernetes’ autoscaling mitigate these issues by optimizing GPU utilization and workload orchestration.

Key Insights

  • “Ray’s RDMA support enables direct GPU object transfers, reducing latency by 40% in distributed training” (Anyscale, 2025).
  • “Sagas over ACID transactions for e-commerce systems” (Martin Fowler, 2012).
  • “Temporal used by Stripe and Coinbase for distributed workflow orchestration” (Temporal.io, 2023).

Practical Applications

  • Use Case: AI-powered code editor Cursor leverages Ray for distributed model training.
  • Pitfall: Failing to align Kubernetes GPU reservations with Ray’s dynamic resource allocation can lead to 20% underutilization of GPU resources.

References:


Continue reading

Next article

Legacy Python Bootstrap Scripts Create Domain-Takeover Risk in Multiple PyPI Packages

Related Content