Skip to main content
ship it and sleep

GitOps Rollback Mechanics and ArgoCD History

4 min read Chapter 62 of 66

GitOps Rollback Mechanics and ArgoCD History

The Failure

The on-call engineer used ArgoCD’s UI to rollback the payments service. It worked instantly. The service recovered. But the engineer did not revert the Git commit. Three minutes later, ArgoCD’s sync interval fired. It detected that the cluster state did not match the Git state. It re-deployed the broken version. The service went down again.

ArgoCD rollback via UI is a temporary fix. It works by deploying a previous sync state to the cluster. But if auto-sync is enabled, ArgoCD will overwrite the rollback on the next sync cycle.

The Mechanism

ArgoCD Sync vs Rollback

ActionWhat HappensPersistent?
SyncDeploy Git state to clusterYes (Git is source of truth)
Rollback (UI)Deploy previous sync stateNo (next sync overwrites)
Git revert + SyncRevert Git, then syncYes

Making UI Rollback Persistent

  1. Rollback via UI (immediate relief)
  2. Disable auto-sync on the Application
  3. Git revert the bad commit
  4. Re-enable auto-sync
  5. ArgoCD syncs to the reverted (good) state

The Implementation

ArgoCD CLI Rollback

# List sync history
argocd app history payments-production
# ID  DATE                 REVISION
# 5   2025-01-15 16:47    abc1234 (bad)
# 4   2025-01-14 10:22    def5678 (good)
# 3   2025-01-13 09:15    ghi9012

# Rollback to previous sync (ID 4)
argocd app rollback payments-production 4

# Immediately disable auto-sync to prevent re-deploy
argocd app set payments-production --sync-policy none

# Now do the git revert
cd ecommerce-infra
git revert abc1234 --no-edit
git push

# Re-enable auto-sync
argocd app set payments-production \
  --sync-policy automated \
  --auto-prune \
  --self-heal

Automated Rollback Script

#!/bin/bash
# scripts/argocd-safe-rollback.sh
# HARDENED: Rollback with auto-sync protection
set -euo pipefail

APP=$1

echo "Step 1: Rollback $APP to previous version"
argocd app rollback "$APP" 0  # 0 = previous sync

echo "Step 2: Disable auto-sync"
argocd app set "$APP" --sync-policy none

echo "Step 3: Verify rollback"
argocd app wait "$APP" --health --timeout 120

echo "Step 4: Find and revert the deployment commit"
SERVICE=$(echo "$APP" | sed 's/-production//')
cd /tmp/ecommerce-infra || git clone [email protected]:acme/ecommerce-infra.git /tmp/ecommerce-infra && cd /tmp/ecommerce-infra
git pull

COMMIT=$(git log --oneline --grep="deploy: $SERVICE" -1 --format="%H")
echo "Reverting: $(git log --oneline -1 $COMMIT)"
git revert "$COMMIT" --no-edit
git push

echo "Step 5: Re-enable auto-sync"
argocd app set "$APP" --sync-policy automated --auto-prune --self-heal

echo "Rollback complete. $APP will sync to reverted state."

Rollback Verification

# ecommerce-infra/.github/workflows/verify-rollback.yml
# HARDENED: Verify service health after rollback
name: Verify Rollback
on:
  push:
    branches: [main]
    paths: ["apps/*/overlays/production/**"]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - name: Wait for ArgoCD sync
        run: sleep 180 # Wait for sync cycle

      - name: Check application health
        run: |
          APPS=$(argocd app list -o name | grep production)
          for app in $APPS; do
            STATUS=$(argocd app get "$app" -o json | jq -r '.status.health.status')
            if [[ "$STATUS" != "Healthy" ]]; then
              echo "::error::$app is $STATUS after deployment"
              exit 1
            fi
          done

ArgoCD Notifications for Rollback

# ArgoCD notifications ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
data:
  trigger.on-health-degraded: |
    - when: app.status.health.status == 'Degraded'
      send: [slack-alert]
  template.slack-alert: |
    message: |
      :rotating_light: *{{.app.metadata.name}}* is {{.app.status.health.status}}
      Revision: {{.app.status.sync.revision}}
      To rollback: `argocd app rollback {{.app.metadata.name}} 0`

The Gate

ArgoCD health status is the gate. After a rollback, the Application must return to Healthy within the timeout. If it does not, the rollback itself failed and manual intervention is needed.

The Recovery

Rollback fails because previous image was garbage-collected: Container registries may delete old images. Set a retention policy that keeps at least 10 previous tags. Or use immutable tags.

Multiple commits need reverting: If two deploys happened between the last known good state, revert both commits: git revert HEAD~2..HEAD --no-edit.

ArgoCD shows OutOfSync after revert: The revert changed the Git state but ArgoCD cached the old state. Force a refresh: argocd app get $APP --refresh.