Environments and Promotion: From Feature Branch to Production Without Guessing
Environments and Promotion
A deployment pipeline without defined environments is a script that pushes code somewhere. A deployment pipeline with defined environments is a promotion chain: code moves from dev to staging to production through gates that verify it is safe to promote.
The word “environment” means a Kubernetes namespace with a specific configuration, a specific set of secrets, and a specific version of the application. Dev runs the latest commit on main. Staging runs the version that passed all CI gates. Production runs the version that passed staging validation.
The Failure
The payments team deployed to production on a Friday. The deployment succeeded. The service started. Health checks passed. Ten minutes later, the on-call engineer got paged: payments were failing with a connection timeout to the payment processor.
The root cause: the staging environment used a different payment processor endpoint than production. The staging endpoint was a sandbox that accepted any request. The production endpoint required mutual TLS that the team had not configured. The deployment was “tested in staging” but staging did not match production.
Environment parity is not a nice-to-have. It is a prerequisite for trusting your staging validation.
The Mechanism
Environment Hierarchy
| Environment | Purpose | Namespace | Image Source | Config Source | Promotion Trigger |
|---|---|---|---|---|---|
| Dev | Latest code, integration testing | dev | CI build from main | values-dev.yaml | Automatic on CI pass |
| Staging | Pre-production validation | staging | Promoted from dev | values-staging.yaml | Automatic on dev validation pass |
| Production | Live traffic | production | Promoted from staging | values-production.yaml | Manual approval + gate pass |
Each environment is a Kubernetes namespace managed by ArgoCD. The ArgoCD Application for each environment points to the same Helm chart but uses different values files.
Promotion Flow
Promotion is not copying files. Promotion is updating a Git reference in the infra repo:
- CI builds the image, tags it with the commit SHA
- CI updates
values-dev.yamlwith the new image tag → ArgoCD syncs dev - Dev validation passes (smoke tests, health checks)
- Pipeline updates
values-staging.yamlwith the same image tag → ArgoCD syncs staging - Staging validation passes (contract tests, performance baseline)
- Team lead approves production promotion
- Pipeline updates
values-production.yaml→ ArgoCD syncs production
The image never changes. The same image digest moves through environments. Only the configuration changes.
The Implementation
Infra Repo Structure
ecommerce-infra/
├── apps/
│ ├── checkout-service/
│ │ ├── base/
│ │ │ ├── deployment.yaml
│ │ │ ├── service.yaml
│ │ │ └── kustomization.yaml
│ │ └── overlays/
│ │ ├── dev/
│ │ │ ├── kustomization.yaml
│ │ │ └── patch-replicas.yaml
│ │ ├── staging/
│ │ │ ├── kustomization.yaml
│ │ │ └── patch-replicas.yaml
│ │ └── production/
│ │ ├── kustomization.yaml
│ │ ├── patch-replicas.yaml
│ │ └── patch-resources.yaml
Promotion Workflow
# HARDENED: Automated promotion with validation gates
name: promote
on:
workflow_dispatch:
inputs:
service:
description: "Service to promote"
required: true
type: choice
options:
[
checkout-service,
catalog-service,
inventory-service,
payments-service,
frontend-shell,
]
from:
description: "Source environment"
required: true
type: choice
options: [dev, staging]
to:
description: "Target environment"
required: true
type: choice
options: [staging, production]
jobs:
validate-promotion:
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.get-tag.outputs.tag }}
steps:
- uses: actions/checkout@v4
with:
repository: acme/ecommerce-infra
- name: Get current image tag in source environment
id: get-tag
run: |
TAG=$(yq '.images[0].newTag' \
apps/${{ inputs.service }}/overlays/${{ inputs.from }}/kustomization.yaml)
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
echo "Promoting ${{ inputs.service }} image $TAG from ${{ inputs.from }} to ${{ inputs.to }}"
- name: Verify source environment is healthy
run: |
kubectl --context=${{ inputs.from }} -n ${{ inputs.from }} \
rollout status deployment/${{ inputs.service }} --timeout=60s
approve:
runs-on: ubuntu-latest
needs: [validate-promotion]
if: inputs.to == 'production'
environment: production
steps:
- run: echo "Production promotion approved for ${{ inputs.service }}"
promote:
runs-on: ubuntu-latest
needs: [validate-promotion, approve]
if: always() && needs.validate-promotion.result == 'success' && (inputs.to != 'production' || needs.approve.result == 'success')
steps:
- uses: actions/checkout@v4
with:
repository: acme/ecommerce-infra
token: ${{ secrets.INFRA_REPO_TOKEN }}
- name: Update target environment
run: |
cd apps/${{ inputs.service }}/overlays/${{ inputs.to }}
kustomize edit set image \
${{ inputs.service }}=ghcr.io/acme/${{ inputs.service }}:${{ needs.validate-promotion.outputs.image-tag }}
- name: Commit and push
run: |
git config user.name "promotion-bot"
git config user.email "[email protected]"
git add -A
git commit -m "promote(${{ inputs.to }}): ${{ inputs.service }} → ${{ needs.validate-promotion.outputs.image-tag }}"
git push
ArgoCD Application per Environment
# HARDENED: ArgoCD Application with environment-specific config
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: checkout-service-staging
namespace: argocd
labels:
app.kubernetes.io/part-of: ecommerce
environment: staging
spec:
project: ecommerce
source:
repoURL: https://github.com/acme/ecommerce-infra.git
targetRevision: main
path: apps/checkout-service/overlays/staging
destination:
server: https://kubernetes.default.svc
namespace: staging
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
factor: 2
maxDuration: 1m
The Gate
Promotion from dev to staging is automatic if dev health checks pass. Promotion from staging to production requires:
- All staging health checks pass for at least 10 minutes
- Staging performance baseline is within 10% of previous deployment (CH17)
- No open P1/P2 incidents
- Manual approval from a team lead via GitHub environment protection rules
The environment: production setting in the approval job activates GitHub’s environment protection rules, which can require specific reviewers, wait timers, and branch restrictions.
The Recovery
Wrong image promoted to production: Revert the infra repo commit. ArgoCD will sync the previous image tag. No code changes needed.
Configuration drift detected in production: Someone manually changed a Kubernetes resource. ArgoCD’s selfHeal: true will revert it. If selfHeal is disabled, the ArgoCD dashboard shows the drift. Investigate who made the manual change and why. Then enable selfHeal.
Staging differs from production in infrastructure: Use the same Kustomize base for all environments. Differences should only be in patches (replicas, resource limits, external endpoints). Review overlays regularly to ensure they only contain intended differences.