Optimizing Kubernetes Scale: Why Moving from GKE Autopilot to EKS with Karpenter Slashes Costs

Why We Moved from GKE to EKS

Engineer Ajinkya migrated production workloads from GKE Autopilot to Amazon EKS to overcome scaling bottlenecks. The move addressed cost inefficiencies where GKE billed based on resource requests rather than actual utilization.

Why This Matters

Fully managed abstractions like GKE Autopilot simplify initial deployment but create performance ceilings for mature workloads requiring specific instance types or ARM-based processors. Moving to EKS and Karpenter allows engineers to shift from linear cost growth to optimized, bin-packed infrastructure that aligns with actual demand rather than over-provisioned requests.

Key Insights

GKE Autopilot pricing is based on requested resources rather than actual usage, leading to significant cost inefficiencies at scale.
Karpenter provisions nodes in under 60 seconds by watching for unschedulable pods in real time rather than scaling pre-defined node groups.
AWS IAM Roles for Service Accounts (IRSA) provides precise, per-pod IAM permissions, eliminating the security risks of shared credentials.
Consolidating nodes with a 2-minute idle timer via ‘consolidateAfter: 2m’ eliminates ghost capacity and idle compute waste.
Graviton (ARM) processors provided meaningful price-performance improvements for compatible workloads after the migration.

Working Examples

Karpenter NodePool configuration for Spot-first provisioning on ARM64 Graviton instances.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-arm64
spec:
  template:
    metadata:
      labels:
        node-pool: spot-arm64
        capacity-type: spot
        arch: arm64
        workload-class: standard
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 168h
  limits:
    cpu: 500
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 2m
  weight: 100

Practical Applications

Use Case: Implementing Spot-first provisioning for non-critical workloads to reduce compute spend. Pitfall: Neglecting interruption handling, which requires SQS-based queues to gracefully drain nodes.
Use Case: Migrating to Graviton (ARM) instances for performance-sensitive services. Pitfall: Underestimating networking differences between GKE VPC-native and AWS VPC, requiring a redesign of subnet layouts.
Use Case: Automating compliance audits using CloudTrail and Security Hub. Pitfall: Assuming broad IAM permissions from GCP will map directly to AWS without refining IRSA role design.

References:

https://dev.to/ajinkya_a3/why-we-moved-from-gke-to-eks-1m96

On This Page

Why We Moved from GKE to EKS

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Mastering Azure Kubernetes Service: Scaling, Security, and Cost Optimization for Engineers

Leveraging EKS Capabilities for Managed Kubernetes Infrastructure and Resource Orchestration

Optimizing AWS EC2 Costs: Why Stopped Instances Still Generate Bills