Skip to main content

On This Page

How to Reduce Kubernetes Costs by 70% with 1.36 Scale-to-Zero

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change

Kubernetes 1.36 now enables Scale-to-Zero by default for the HorizontalPodAutoscaler (HPA). This feature allows clusters to terminate pods completely during idle periods, potentially cutting development environment costs from $450 to $120 per month.

Why This Matters

Standard Kubernetes configurations maintain running pods regardless of traffic, leading to significant waste in development and staging environments that sit idle during off-hours. While ideal models suggest constant availability, the technical reality is that many services experience long idle periods where compute resources are paid for but not utilized. Kubernetes 1.36 addresses this by allowing minReplicas to be set to zero, aligning infrastructure costs directly with actual demand.

Key Insights

  • Development environments can see a 73% cost reduction by scaling to zero during nights and weekends (AttractivePenguin, 2026)
  • The HorizontalPodAutoscaler (HPA) in Kubernetes 1.36 requires minReplicas: 0 to activate the scale-to-zero feature
  • A mandatory readiness probe is required for Kubernetes to determine if a pod can handle traffic after scaling back up from zero
  • Stabilization windows, such as stabilizationWindowSeconds set to 300, prevent pod flapping by enforcing an idle period before scale-down
  • The metrics-server tool is a technical prerequisite for HPA to monitor resource utilization and trigger scaling actions

Working Examples

HPA configuration enabling scale-to-zero with minReplicas set to 0.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  minReplicas: 0
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Critical readiness probe configuration required for functional scaling.

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Cooldown period configuration to prevent rapid scaling fluctuations.

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15

Practical Applications

  • Use case: Development environments with 5 namespaces reducing monthly spend from $450 to $120. Pitfall: Omitting readiness probes prevents Kubernetes from reliably managing traffic during scale-up.
  • Use case: Event-driven API workloads with spiky traffic achieving 65% savings. Pitfall: Slow cold starts on the first request after scaling to zero can impact latency-sensitive services.
  • Use case: Staging environments sitting idle between deployments saving 70% on compute. Pitfall: Scheduled CronJobs failing to trigger scale-up because they do not interact with HPA metrics.

References:

Continue reading

Next article

Mastering Multi-SMTP Delivery and Smart Failover in SHONiR CMS

Related Content