Skip to main content
ship it and sleep

Infrastructure Repo Structure and Environment Overlays

4 min read Chapter 56 of 66

Infrastructure Repo Structure and Environment Overlays

The Failure

The infra repo started with flat directories: one folder per service, each containing full Kubernetes manifests for every environment. The checkout service had checkout-staging.yaml and checkout-production.yaml—two 200-line files that were 95% identical. When someone added a new environment variable to staging but forgot production, the services drifted. Six months later, the repo had 50 YAML files with no clear relationship between staging and production configurations.

The base/overlay pattern eliminates duplication. One base definition, thin overlays per environment.

The Mechanism

Directory Convention

ecommerce-infra/
├── apps/                    # Application workloads
│   ├── catalog-service/
│   │   ├── base/
│   │   │   ├── kustomization.yaml
│   │   │   ├── deployment.yaml
│   │   │   ├── service.yaml
│   │   │   └── hpa.yaml
│   │   └── overlays/
│   │       ├── staging/
│   │       │   ├── kustomization.yaml
│   │       │   └── patches/
│   │       │       ├── replicas.yaml
│   │       │       └── env.yaml
│   │       └── production/
│   │           ├── kustomization.yaml
│   │           └── patches/
│   │               ├── replicas.yaml
│   │               ├── env.yaml
│   │               └── resources.yaml
│   ├── checkout-service/
│   │   └── (same structure)
│   └── ...
├── platform/                # Shared infrastructure
│   ├── argocd/
│   ├── monitoring/
│   ├── ingress/
│   ├── cert-manager/
│   └── external-secrets/
├── clusters/                # Cluster-level config
│   ├── staging/
│   │   └── kustomization.yaml  # References all staging overlays
│   └── production/
│       └── kustomization.yaml
└── CODEOWNERS

The Implementation

Base Definition

# apps/checkout-service/base/deployment.yaml
# HARDENED: Base deployment - environment-agnostic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-service
  labels:
    app.kubernetes.io/name: checkout-service
    app.kubernetes.io/part-of: ecommerce
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: checkout-service
  template:
    metadata:
      labels:
        app.kubernetes.io/name: checkout-service
    spec:
      containers:
        - name: checkout-service
          image: ghcr.io/acme/checkout-service:latest
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
# apps/checkout-service/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
commonLabels:
  app.kubernetes.io/managed-by: kustomize

Staging Overlay

# apps/checkout-service/overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: staging
resources:
  - ../../base
patches:
  - path: patches/replicas.yaml
  - path: patches/env.yaml
# apps/checkout-service/overlays/staging/patches/replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-service
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: checkout-service
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi

Production Overlay

# apps/checkout-service/overlays/production/patches/replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-service
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: checkout-service
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1Gi

CODEOWNERS

# CODEOWNERS
# HARDENED: Require review for production changes
/apps/*/overlays/production/  @platform-team
/platform/                    @platform-team
/clusters/production/         @platform-team @sre-team
/apps/*/overlays/staging/     @dev-team

Validate Overlays in CI

# ecommerce-infra/.github/workflows/validate.yml
# HARDENED: Validate all overlays render correctly
name: Validate Manifests
on:
  pull_request:

jobs:
  validate:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        env: [staging, production]
    steps:
      - uses: actions/checkout@v4

      - name: Validate kustomize build
        run: |
          for app in apps/*/overlays/${{ matrix.env }}; do
            echo "Validating: $app"
            kustomize build "$app" > /dev/null
          done

      - name: Kubeval validation
        run: |
          for app in apps/*/overlays/${{ matrix.env }}; do
            kustomize build "$app" | kubeval --strict
          done

The Gate

CODEOWNERS is the gate for production changes. Any PR modifying production overlays requires review from the platform team. Staging changes are self-service.

The Recovery

Overlays diverge too far from base: If production and staging have very different configurations, the overlays become as large as the base. Refactor: move shared config to base, keep only true differences in overlays (replicas, resources, environment variables).

New service requires many files to bootstrap: Create a service template directory. Use cp -r apps/_template apps/new-service and fill in the service name.

Kustomize build fails on merge: The validation CI catches this before merge. If it slips through, ArgoCD will show the Application as OutOfSync with an error message.