GitOps Rollback Mechanics and ArgoCD History
GitOps Rollback Mechanics and ArgoCD History
The Failure
The on-call engineer used ArgoCD’s UI to rollback the payments service. It worked instantly. The service recovered. But the engineer did not revert the Git commit. Three minutes later, ArgoCD’s sync interval fired. It detected that the cluster state did not match the Git state. It re-deployed the broken version. The service went down again.
ArgoCD rollback via UI is a temporary fix. It works by deploying a previous sync state to the cluster. But if auto-sync is enabled, ArgoCD will overwrite the rollback on the next sync cycle.
The Mechanism
ArgoCD Sync vs Rollback
| Action | What Happens | Persistent? |
|---|---|---|
| Sync | Deploy Git state to cluster | Yes (Git is source of truth) |
| Rollback (UI) | Deploy previous sync state | No (next sync overwrites) |
| Git revert + Sync | Revert Git, then sync | Yes |
Making UI Rollback Persistent
- Rollback via UI (immediate relief)
- Disable auto-sync on the Application
- Git revert the bad commit
- Re-enable auto-sync
- ArgoCD syncs to the reverted (good) state
The Implementation
ArgoCD CLI Rollback
# List sync history
argocd app history payments-production
# ID DATE REVISION
# 5 2025-01-15 16:47 abc1234 (bad)
# 4 2025-01-14 10:22 def5678 (good)
# 3 2025-01-13 09:15 ghi9012
# Rollback to previous sync (ID 4)
argocd app rollback payments-production 4
# Immediately disable auto-sync to prevent re-deploy
argocd app set payments-production --sync-policy none
# Now do the git revert
cd ecommerce-infra
git revert abc1234 --no-edit
git push
# Re-enable auto-sync
argocd app set payments-production \
--sync-policy automated \
--auto-prune \
--self-heal
Automated Rollback Script
#!/bin/bash
# scripts/argocd-safe-rollback.sh
# HARDENED: Rollback with auto-sync protection
set -euo pipefail
APP=$1
echo "Step 1: Rollback $APP to previous version"
argocd app rollback "$APP" 0 # 0 = previous sync
echo "Step 2: Disable auto-sync"
argocd app set "$APP" --sync-policy none
echo "Step 3: Verify rollback"
argocd app wait "$APP" --health --timeout 120
echo "Step 4: Find and revert the deployment commit"
SERVICE=$(echo "$APP" | sed 's/-production//')
cd /tmp/ecommerce-infra || git clone [email protected]:acme/ecommerce-infra.git /tmp/ecommerce-infra && cd /tmp/ecommerce-infra
git pull
COMMIT=$(git log --oneline --grep="deploy: $SERVICE" -1 --format="%H")
echo "Reverting: $(git log --oneline -1 $COMMIT)"
git revert "$COMMIT" --no-edit
git push
echo "Step 5: Re-enable auto-sync"
argocd app set "$APP" --sync-policy automated --auto-prune --self-heal
echo "Rollback complete. $APP will sync to reverted state."
Rollback Verification
# ecommerce-infra/.github/workflows/verify-rollback.yml
# HARDENED: Verify service health after rollback
name: Verify Rollback
on:
push:
branches: [main]
paths: ["apps/*/overlays/production/**"]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- name: Wait for ArgoCD sync
run: sleep 180 # Wait for sync cycle
- name: Check application health
run: |
APPS=$(argocd app list -o name | grep production)
for app in $APPS; do
STATUS=$(argocd app get "$app" -o json | jq -r '.status.health.status')
if [[ "$STATUS" != "Healthy" ]]; then
echo "::error::$app is $STATUS after deployment"
exit 1
fi
done
ArgoCD Notifications for Rollback
# ArgoCD notifications ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
data:
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [slack-alert]
template.slack-alert: |
message: |
:rotating_light: *{{.app.metadata.name}}* is {{.app.status.health.status}}
Revision: {{.app.status.sync.revision}}
To rollback: `argocd app rollback {{.app.metadata.name}} 0`
The Gate
ArgoCD health status is the gate. After a rollback, the Application must return to Healthy within the timeout. If it does not, the rollback itself failed and manual intervention is needed.
The Recovery
Rollback fails because previous image was garbage-collected: Container registries may delete old images. Set a retention policy that keeps at least 10 previous tags. Or use immutable tags.
Multiple commits need reverting: If two deploys happened between the last known good state, revert both commits: git revert HEAD~2..HEAD --no-edit.
ArgoCD shows OutOfSync after revert: The revert changed the Git state but ArgoCD cached the old state. Force a refresh: argocd app get $APP --refresh.