Optimizing Kubernetes Scale: Why Moving from GKE Autopilot to EKS with Karpenter Slashes Costs
These articles are AI-generated summaries. Please check the original sources for full details.
Why We Moved from GKE to EKS
Engineer Ajinkya migrated production workloads from GKE Autopilot to Amazon EKS to overcome scaling bottlenecks. The move addressed cost inefficiencies where GKE billed based on resource requests rather than actual utilization.
Why This Matters
Fully managed abstractions like GKE Autopilot simplify initial deployment but create performance ceilings for mature workloads requiring specific instance types or ARM-based processors. Moving to EKS and Karpenter allows engineers to shift from linear cost growth to optimized, bin-packed infrastructure that aligns with actual demand rather than over-provisioned requests.
Key Insights
- GKE Autopilot pricing is based on requested resources rather than actual usage, leading to significant cost inefficiencies at scale.
- Karpenter provisions nodes in under 60 seconds by watching for unschedulable pods in real time rather than scaling pre-defined node groups.
- AWS IAM Roles for Service Accounts (IRSA) provides precise, per-pod IAM permissions, eliminating the security risks of shared credentials.
- Consolidating nodes with a 2-minute idle timer via ‘consolidateAfter: 2m’ eliminates ghost capacity and idle compute waste.
- Graviton (ARM) processors provided meaningful price-performance improvements for compatible workloads after the migration.
Working Examples
Karpenter NodePool configuration for Spot-first provisioning on ARM64 Graviton instances.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-arm64
spec:
template:
metadata:
labels:
node-pool: spot-arm64
capacity-type: spot
arch: arm64
workload-class: standard
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 168h
limits:
cpu: 500
memory: 2000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 2m
weight: 100
Practical Applications
- Use Case: Implementing Spot-first provisioning for non-critical workloads to reduce compute spend. Pitfall: Neglecting interruption handling, which requires SQS-based queues to gracefully drain nodes.
- Use Case: Migrating to Graviton (ARM) instances for performance-sensitive services. Pitfall: Underestimating networking differences between GKE VPC-native and AWS VPC, requiring a redesign of subnet layouts.
- Use Case: Automating compliance audits using CloudTrail and Security Hub. Pitfall: Assuming broad IAM permissions from GCP will map directly to AWS without refining IRSA role design.
References:
Continue reading
Next article
Why AI Agents Fail in Production: From Notebook Prototypes to Enterprise Systems
Related Content
Mastering Azure Kubernetes Service: Scaling, Security, and Cost Optimization for Engineers
Optimize Azure Kubernetes Service using KEDA for event-driven scaling and Spot Instances to achieve up to 90% cost savings on fault-tolerant workloads.
Leveraging EKS Capabilities for Managed Kubernetes Infrastructure and Resource Orchestration
AWS EKS Capabilities (Nov 2025) enables platform engineers to replace manual Helm-based controller management with managed ACK and KRO services for full-stack provisioning.
Optimizing AWS EC2 Costs: Why Stopped Instances Still Generate Bills
Stopped AWS EC2 instances can cost $40/month for 500GB of storage. Discover hidden EBS, IPv4, and snapshot costs and how to save 20% by migrating to gp3.