Pipeline Architecture: Stages, Jobs, Artifacts, and the Dependency Graph
Pipeline Architecture
A pipeline is a directed acyclic graph. Jobs are nodes. Dependencies are edges. The critical path, the longest chain of sequential dependencies, determines how long the pipeline takes. Everything else is parallelism you are leaving on the table.
Most teams start with a pipeline that is a single job with 15 steps, running sequentially. Build, then test, then lint, then scan, then push, then deploy. Total duration: the sum of every step. If the scan takes 4 minutes and the integration tests take 6 minutes, the pipeline takes at least 10 minutes even though those two steps have no dependency on each other.
The diagram shows two pipeline architectures for the same set of tasks. On the left, a sequential pipeline runs seven steps one after another, totaling 23 minutes. On the right, the same steps are organized as a DAG with parallel branches: unit tests, integration tests, and security scanning run concurrently after the build step, reducing the total duration to 14 minutes. The critical path runs through the build and integration test stages. Every other branch completes while the critical path is still running.
The Failure
The checkout service pipeline runs in a single job. Build the Docker image (3 min), run unit tests (2 min), run integration tests (6 min), run security scan (4 min), push the image (1 min), update the infra repo (1 min). Total: 17 minutes.
A developer pushes a fix for a typo in a log message. They wait 17 minutes for the pipeline to complete. The security scan and integration tests have no dependency on each other, but they run sequentially because the pipeline is a flat list of steps, not a graph.
The team adds a Locust performance test (5 min) and a contract test suite (3 min). The pipeline is now 25 minutes. Developers start batching commits to avoid the wait. Batched commits make rollbacks harder because each deployment contains multiple changes. The pipeline’s structure is causing deployment risk.
The Mechanism
GitHub Actions workflows consist of jobs. Each job runs on a separate runner. Jobs can depend on other jobs via the needs: keyword. Jobs without dependencies run in parallel by default.
The key insight: a job is an isolation boundary. Each job gets a fresh runner, a clean filesystem, and no shared state with other jobs unless explicitly passed through artifacts or outputs. This isolation is a feature, not a limitation. It means that test jobs cannot accidentally depend on build artifacts that happen to be in the same working directory. Dependencies must be declared.
┌──────────┐
│ build │
└────┬─────┘
│
┌──────────┼──────────┐
│ │ │
v v v
┌─────────┐┌─────────┐┌─────────┐
│ unit ││ integr- ││ scan │
│ test ││ ation ││ │
└────┬────┘└────┬────┘└────┬────┘
│ │ │
└──────────┼──────────┘
│
v
┌──────────┐
│ push │
└────┬─────┘
│
v
┌────────────┐
│update-infra│
└────────────┘
The critical path is: build → integration test → push → update-infra = 3 + 6 + 1 + 1 = 11 minutes. Unit tests (2 min) and scanning (4 min) complete while integration tests are still running. Total pipeline duration dropped from 17 to 11 minutes without removing any work.
The Implementation
# FRAGILE: Single job, sequential execution
name: ci
on:
push:
branches: [main]
pull_request:
jobs:
pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t ghcr.io/acme/checkout-service:${{ github.sha }} .
- name: Unit tests
run: docker run --rm ghcr.io/acme/checkout-service:${{ github.sha }} ./run-unit-tests.sh
- name: Integration tests
run: |
docker compose -f docker-compose.test.yml up -d
docker run --rm --network=host ghcr.io/acme/checkout-service:${{ github.sha }} ./run-integration-tests.sh
docker compose -f docker-compose.test.yml down
- name: Security scan
run: trivy image ghcr.io/acme/checkout-service:${{ github.sha }}
- name: Push image
run: docker push ghcr.io/acme/checkout-service:${{ github.sha }}
- name: Update infra repo
run: |
git clone https://x-access-token:${{ secrets.INFRA_TOKEN }}@github.com/acme/ecommerce-infra.git
cd ecommerce-infra
# ... update and push
# HARDENED: DAG with parallel jobs and explicit artifact passing
name: ci
on:
push:
branches: [main]
pull_request:
env:
IMAGE: ghcr.io/acme/checkout-service
REGISTRY: ghcr.io
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ github.sha }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ env.IMAGE }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
unit-test:
runs-on: ubuntu-latest
needs: [build]
steps:
- uses: actions/checkout@v4
- name: Run unit tests
run: |
docker run --rm \
${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }} \
./run-unit-tests.sh
integration-test:
runs-on: ubuntu-latest
needs: [build]
steps:
- uses: actions/checkout@v4
- name: Start dependencies
run: docker compose -f docker-compose.test.yml up -d --wait
- name: Run integration tests
run: |
docker run --rm --network=host \
${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }} \
./run-integration-tests.sh
- name: Stop dependencies
if: always()
run: docker compose -f docker-compose.test.yml down
scan:
runs-on: ubuntu-latest
needs: [build]
steps:
- name: Trivy vulnerability scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }}
exit-code: 1
severity: CRITICAL,HIGH
format: table
push-summary:
runs-on: ubuntu-latest
needs: [build, unit-test, integration-test, scan]
if: github.ref == 'refs/heads/main'
steps:
- name: Emit pipeline summary
run: |
echo "## Pipeline Complete" >> $GITHUB_STEP_SUMMARY
echo "All gates passed for \`${{ env.IMAGE }}:${{ github.sha }}\`" >> $GITHUB_STEP_SUMMARY
update-infra:
runs-on: ubuntu-latest
needs: [build, unit-test, integration-test, scan]
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
with:
repository: acme/ecommerce-infra
token: ${{ secrets.INFRA_REPO_TOKEN }}
path: infra
- name: Update staging image
working-directory: infra
run: |
cd overlays/staging/checkout
kustomize edit set image \
${{ env.IMAGE }}=${{ env.IMAGE }}:${{ needs.build.outputs.image-tag }}
- name: Commit and push
working-directory: infra
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add .
git commit -m "checkout: promote ${{ needs.build.outputs.image-tag }} to staging"
git push
The Gate
The update-infra job has needs: [build, unit-test, integration-test, scan]. All four jobs must succeed before the infra repo is updated. If any job fails, the graph stops. There is no path from a failed scan to a deployed image.
This is the fundamental value of modeling the pipeline as a DAG: the dependency edges are the gates. Adding a new gate (contract tests, Locust performance, SBOM validation) means adding a new job and adding it to the needs: list of the promotion job.
The Recovery
When a gate fails on a feature branch, the recovery is fixing the code and pushing again. When a gate fails on main, the recovery is the same, but with more urgency: the main branch has a broken pipeline, and no new code can be promoted until it is fixed.
To prevent this, use branch protection rules that require the CI workflow to pass before merging to main. The pipeline runs on the pull request, all gates pass, the PR is merged, and the pipeline runs again on main. The second run should be identical (reproducibility), but it catches integration issues that only appear when multiple PRs merge close together.
Measuring the Critical Path
The critical path is the longest chain of sequential job durations. To find it, trace every path from the first job to the last and sum the durations:
Path 1: build (3m) → unit-test (2m) → update-infra (1m) = 6m
Path 2: build (3m) → integration-test (6m) → update-infra (1m) = 10m
Path 3: build (3m) → scan (4m) → update-infra (1m) = 8m
Path 2 is the critical path at 10 minutes. Optimizing unit tests or scanning does not reduce the total pipeline duration. Only optimizing the build step or the integration test step matters.
This analysis determines where to invest optimization effort. If the team spends a week reducing scan time from 4 minutes to 1 minute, the pipeline still takes 10 minutes. If they spend the same week reducing integration test time from 6 minutes to 3 minutes, the pipeline drops to 7 minutes.