How to Reduce Kubernetes Costs by 70% with 1.36 Scale-to-Zero

Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change

Kubernetes 1.36 now enables Scale-to-Zero by default for the HorizontalPodAutoscaler (HPA). This feature allows clusters to terminate pods completely during idle periods, potentially cutting development environment costs from $450 to $120 per month.

Why This Matters

Standard Kubernetes configurations maintain running pods regardless of traffic, leading to significant waste in development and staging environments that sit idle during off-hours. While ideal models suggest constant availability, the technical reality is that many services experience long idle periods where compute resources are paid for but not utilized. Kubernetes 1.36 addresses this by allowing minReplicas to be set to zero, aligning infrastructure costs directly with actual demand.

Key Insights

Development environments can see a 73% cost reduction by scaling to zero during nights and weekends (AttractivePenguin, 2026)
The HorizontalPodAutoscaler (HPA) in Kubernetes 1.36 requires minReplicas: 0 to activate the scale-to-zero feature
A mandatory readiness probe is required for Kubernetes to determine if a pod can handle traffic after scaling back up from zero
Stabilization windows, such as stabilizationWindowSeconds set to 300, prevent pod flapping by enforcing an idle period before scale-down
The metrics-server tool is a technical prerequisite for HPA to monitor resource utilization and trigger scaling actions

Working Examples

HPA configuration enabling scale-to-zero with minReplicas set to 0.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  minReplicas: 0
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Critical readiness probe configuration required for functional scaling.

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Cooldown period configuration to prevent rapid scaling fluctuations.

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15

Practical Applications

Use case: Development environments with 5 namespaces reducing monthly spend from $450 to $120. Pitfall: Omitting readiness probes prevents Kubernetes from reliably managing traffic during scale-up.
Use case: Event-driven API workloads with spiky traffic achieving 65% savings. Pitfall: Slow cold starts on the first request after scaling to zero can impact latency-sensitive services.
Use case: Staging environments sitting idle between deployments saving 70% on compute. Pitfall: Scheduled CronJobs failing to trigger scale-up because they do not interact with HPA metrics.

References:

https://dev.to/benriemer/kubernetes-136-scale-to-zero-cut-your-k8s-bill-by-70-with-one-config-change-45b6

On This Page

Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Kubernetes 1.35 Released with In-Place Pod Resize and AI-Optimized Scheduling

Optimizing AKS Deployments via Centralized Azure DevOps YAML Templates

Helm 4 Release: Modernizing Kubernetes Package Management with OCI and Native CRD Lifecycle