What Modern CI/CD Actually Means

Your pipeline runs on push. It builds a Docker image, runs some tests, pushes the image to a registry, and updates a deployment manifest. On a good day, new code reaches production in 12 minutes. On a bad day, a broken image gets promoted to production because the integration test was flaky and someone added continue-on-error: true six months ago.

That pipeline is not CI/CD. It is an automated build script with a deployment step bolted on. The difference matters at 3am when the payments service is returning 500s and the last deployment touched three services across two repositories.

The Four Properties of a Production Pipeline

A production-grade pipeline has four properties. If any one is missing, the pipeline is a liability during an incident.

Reproducibility. Given the same commit SHA, the pipeline produces the same artifact. Every dependency is pinned. Every base image is digested. The build does not depend on the state of a shared cache, a mutable tag, or a developer’s local environment. When the incident retrospective asks “what changed between the last good deploy and this one,” the answer is a diff, not a shrug.

Observability. Every pipeline run emits structured data: duration per stage, test counts and pass rates, image size, vulnerability scan results, deployment target, promotion decision. When pipeline duration doubles over two weeks, you know which stage caused it before anyone files a ticket.

Gateability. The pipeline has hard gates that block promotion. Not warnings. Not Slack notifications that get buried. Hard failures that prevent a bad artifact from reaching the next environment. A security scan that finds a critical CVE blocks the build. A Locust test that shows p99 latency above the threshold blocks promotion to production. Gates are code, not process.

Recoverability. When a bad deploy reaches production, the pipeline provides a path back that does not require a developer to be awake and thinking clearly. GitOps makes this a git revert. Argo Rollouts makes it an automated analysis that triggers rollback before the on-call engineer finishes reading the page. A pipeline without a recovery path is a pipeline that assumes every deploy succeeds.

Most teams have reproducibility partially covered (they pin some dependencies), observability through CI provider dashboards (not custom metrics), gateability through unit tests only, and recoverability through “revert the PR and re-run the pipeline.” That last one takes 15 minutes if someone is at their laptop. At 3am, it takes 40.

The E-Commerce Platform

Every chapter in this book uses the same domain. A multi-service e-commerce platform with five services:

Product Catalog (catalog-service). Serves product listings, search, and detail pages. Read-heavy, cache-friendly, deployed independently. Its pipeline is the simplest in the platform and makes a good baseline for introducing pipeline concepts.

Inventory (inventory-service). Tracks stock levels, handles reservations during checkout, and processes restocking events. Write-heavy, sensitive to race conditions, and the first service where database migrations become a deployment concern.

Checkout (checkout-service). Orchestrates the purchase flow: validates cart, reserves inventory, initiates payment, confirms order. Calls three other services. Its pipeline is the most complex because a bad deploy here means lost revenue in minutes.

Payments (payments-service). Integrates with payment processors, handles PCI-scoped data, and requires the strictest secrets management. Its pipeline has the most gates and the most restrictive promotion policy.

Frontend Shell (frontend-shell). A server-side rendered application that composes responses from the other services. Its deployment affects every user immediately and is the most visible failure mode.

Each service has its own GitHub repository. A sixth repository, ecommerce-infra, holds all Kubernetes manifests, Helm values, Kustomize overlays, and ArgoCD application definitions. This separation is not optional in this book. When a chapter references “the app repo,” it means one of the five service repos. When it references “the infra repo,” it means ecommerce-infra. The distinction matters for every pipeline pattern, every GitOps workflow, and every rollback procedure.

The Repository Topology

github.com/acme/
├── catalog-service/          # App repo
│   ├── src/
│   ├── Dockerfile
│   ├── load-tests/
│   │   └── locustfile.py
│   └── .github/workflows/
│       └── ci.yml
├── inventory-service/        # App repo
├── checkout-service/         # App repo
├── payments-service/         # App repo
├── frontend-shell/           # App repo
└── ecommerce-infra/          # Infra repo (GitOps)
    ├── base/
    │   ├── catalog/
    │   ├── inventory/
    │   ├── checkout/
    │   ├── payments/
    │   └── frontend/
    ├── overlays/
    │   ├── staging/
    │   └── production/
    └── argocd/
        ├── app-of-apps.yaml
        ├── catalog.yaml
        ├── inventory.yaml
        ├── checkout.yaml
        ├── payments.yaml
        └── frontend.yaml

Every service repo follows the same structure. Source code, a Dockerfile, a load-tests/ directory with a Locust scenario, and a .github/workflows/ directory with the CI pipeline. The infra repo uses Kustomize with base and overlay directories, plus an argocd/ directory with application definitions.

The Four Opinions

This book has a point of view and defends it consistently.

GitHub Actions is the default CI platform. Not Jenkins. Not CircleCI. Not Tekton. GitHub Actions has the broadest ecosystem of community-maintained actions, native integration with the repository where the code lives, and matrix builds that handle multi-platform testing without custom orchestration. GitLab CI is referenced in specific chapters where it handles a problem differently (particularly around environment management and built-in container registries). Tekton is referenced as the on-cluster pipeline alternative for teams that need pipeline execution inside the Kubernetes cluster. Every CI example in this book uses GitHub Actions YAML.

ArgoCD is the default GitOps engine. Chapter 12 provides an honest comparison with Flux. ArgoCD wins the default position because of its web UI for operational visibility, its app-of-apps model for managing multiple services, and its larger community with more third-party integrations. Flux is a legitimate choice for teams that prefer a pull-based reconciliation model without a centralized server. This book uses ArgoCD for every delivery example.

The GitOps repo is separate from the application repo. This is the most common point of disagreement in GitOps adoption. Some teams keep Kubernetes manifests in the same repo as the application code. That works until you have five services, three environments, and a hotfix that needs to reach production without triggering a full CI pipeline for application code that did not change. Separate repos enforce a clean contract: the CI pipeline produces an image tag, the CD pipeline consumes it. Chapter 19 covers this architecture in detail.

Performance gates are non-negotiable. A pipeline that does not test performance before promoting to production is a pipeline that ships regressions silently. Every chapter that introduces a deployment pattern includes or references a Locust gate. The gate has three criteria: p99 latency below a threshold, error rate below a ceiling, and throughput above a floor. When the gate fails, the pipeline blocks promotion and notifies the team. When it fails during a canary deployment, Argo Rollouts triggers an automated rollback. Chapter 17 covers this in depth.

A Pipeline That Demonstrates All Four Properties

Here is the CI pipeline for the checkout service. It is not complete (later chapters add security scanning, performance gates, and promotion logic), but it demonstrates reproducibility, observability, gateability, and recoverability in a single workflow.

# FRAGILE: A pipeline that builds and pushes without gates or traceability
name: ci
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t acme/checkout-service .
      - run: docker push acme/checkout-service
      - run: |
          cd ../ecommerce-infra
          sed -i "s|image:.*|image: acme/checkout-service:latest|" \
            overlays/production/checkout/deployment.yaml
          git add . && git commit -m "deploy" && git push

Everything wrong with this pipeline:

The image tag is latest. Two builds from different commits produce images with the same tag. Reproducibility is gone. The docker push happens before any test. Gateability is absent. The pipeline directly edits the production overlay and pushes to the infra repo. There is no staging environment, no approval, no rollback path. Recoverability requires someone to manually revert the infra repo commit and wait for ArgoCD to sync. The pipeline emits no structured metadata. Observability is whatever the GitHub Actions log viewer shows.

# HARDENED: Same pipeline with all four properties
name: ci
on:
  push:
    branches: [main]

env:
  IMAGE: ghcr.io/acme/checkout-service
  REGISTRY: ghcr.io

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
      image-tag: ${{ steps.meta.outputs.version }}
    steps:
      - uses: actions/checkout@v4

      - name: Generate image metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.IMAGE }}
          tags: |
            type=sha,prefix=,format=short

      - name: Log in to registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

      - name: Emit build metadata
        run: |
          echo "## Build Summary" >> $GITHUB_STEP_SUMMARY
          echo "| Property | Value |" >> $GITHUB_STEP_SUMMARY
          echo "|----------|-------|" >> $GITHUB_STEP_SUMMARY
          echo "| Image | \`${{ env.IMAGE }}:${{ steps.meta.outputs.version }}\` |" >> $GITHUB_STEP_SUMMARY
          echo "| Digest | \`${{ steps.build.outputs.digest }}\` |" >> $GITHUB_STEP_SUMMARY
          echo "| Commit | \`${{ github.sha }}\` |" >> $GITHUB_STEP_SUMMARY

  test:
    runs-on: ubuntu-latest
    needs: [build]
    steps:
      - uses: actions/checkout@v4
      - name: Run unit tests
        run: |
          docker run --rm \
            ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }} \
            ./run-tests.sh

      - name: Run integration tests
        run: |
          docker compose -f docker-compose.test.yml up --abort-on-container-exit
        env:
          CHECKOUT_IMAGE: ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }}

  update-infra:
    runs-on: ubuntu-latest
    needs: [build, test]
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
        with:
          repository: acme/ecommerce-infra
          token: ${{ secrets.INFRA_REPO_TOKEN }}
          path: infra

      - name: Update staging image tag
        working-directory: infra
        run: |
          cd overlays/staging/checkout
          kustomize edit set image \
            ${{ env.IMAGE }}=${{ env.IMAGE }}:${{ needs.build.outputs.image-tag }}

      - name: Commit and push
        working-directory: infra
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add .
          git commit -m "checkout-service: promote ${{ needs.build.outputs.image-tag }} to staging"
          git push

The difference:

Reproducibility. The image tag is the short SHA of the commit. The test job runs against the image digest, not a mutable tag. The same commit always produces the same image.

Observability. The build emits a summary with the image reference, digest, and commit SHA. Later chapters add duration metrics and artifact size tracking.

Gateability. The update-infra job depends on test. If tests fail, the infra repo is never updated. Later chapters add security scanning and Locust performance gates as additional required jobs.

Recoverability. The pipeline updates the staging overlay, not production. Promotion to production happens through a separate process (covered in CH7 and CH8). Rollback is a git revert on the infra repo, which ArgoCD reconciles automatically.

This pipeline is the starting point. By chapter 22, it includes dependency scanning, container vulnerability checks, SBOM generation, Locust performance gates, canary analysis, and automated rollback. Each addition follows the same five-part structure: the failure it prevents, the mechanism behind it, the implementation, the gate condition, and the recovery path.

What This Book Is Not

Not a Kubernetes administration guide. The reader knows how to run kubectl apply. Not a GitHub tutorial. The reader has used GitHub Actions. Not a DevOps transformation manifesto. No one here needs convincing that automation is good. Not a cloud provider certification guide. EKS, GKE, and AKS differences are noted where they matter and ignored where they do not.

The reader already has a pipeline. This book makes it production-grade: reproducible, observable, safe to run unattended, and recoverable when it fails.