Artifact Passing and Cache Strategy
Artifact Passing and Cache Strategy
The Failure
The frontend shell pipeline builds a Next.js application in the build job and needs the output in the deploy job. The team passes the build output using actions/upload-artifact and actions/download-artifact. The build output is 180 MB. Uploading takes 45 seconds. Downloading takes 30 seconds. This happens on every pipeline run.
Meanwhile, the same pipeline caches node_modules with a key of node-modules-v1. The cache was created three months ago. Dependencies have changed seven times since then. The cache still hits because the key never changes. The pipeline restores stale dependencies and runs npm install on top of them. Sometimes this works. Sometimes it produces a node_modules directory that differs from a clean npm ci. The team does not know which builds used stale dependencies because the cache key does not encode the dependency state.
The Mechanism
GitHub Actions provides three mechanisms for passing data between jobs:
Job outputs ($GITHUB_OUTPUT). Small string values: image tags, version numbers, SHA digests, URLs. Passed through workflow syntax: ${{ needs.build.outputs.image-tag }}. Maximum 1 MB per output. Use this for metadata.
Artifacts (actions/upload-artifact, actions/download-artifact). Files and directories. Uploaded to GitHub’s artifact storage, available to downstream jobs and after the workflow completes. Retained for 90 days by default. Use this for build outputs, test reports, and SBOM files.
Caches (actions/cache). Files and directories. Keyed by a string. Restored at the start of a job if the key matches. Saved at the end of the job if no cache existed for the key. Scoped to the branch (with fallback to the default branch). Evicted after 7 days of inactivity or when the 10 GB per-repository limit is reached. Use this for dependencies and build tool caches that are expensive to recreate.
The distinction: artifacts carry data forward in the current pipeline run. Caches carry data forward across pipeline runs. Outputs carry small values between jobs without storage overhead.
The Implementation
Job Outputs for Metadata
# HARDENED: Image tag passed as job output, not hardcoded in downstream jobs
jobs:
build:
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.version }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Image metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/acme/frontend-shell
tags: type=sha,prefix=,format=short
- name: Build and push
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
deploy:
needs: [build]
runs-on: ubuntu-latest
steps:
- name: Use image reference
run: |
echo "Deploying ghcr.io/acme/frontend-shell:${{ needs.build.outputs.image-tag }}"
echo "Digest: ${{ needs.build.outputs.image-digest }}"
Artifacts for Test Reports
# HARDENED: Test reports uploaded as artifacts for debugging and audit
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: npm run test -- --reporter=junit --output-file=test-results.xml
continue-on-error: true
id: tests
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results
path: test-results.xml
retention-days: 30
- name: Fail if tests failed
if: steps.tests.outcome == 'failure'
run: exit 1
The pattern: run tests with continue-on-error: true on the test step so the artifact upload always runs, then explicitly fail in a subsequent step if tests failed. This ensures test reports are always available for debugging, even on failure.
Dependency Caching with Hash-Based Keys
# FRAGILE: Static cache key, never invalidated
- uses: actions/cache@v4
with:
path: node_modules
key: node-modules-v1
# HARDENED: Hash-based key tied to lock file content
- uses: actions/cache@v4
with:
path: |
node_modules
~/.npm
key: deps-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
deps-${{ runner.os }}-
The hashFiles('package-lock.json') function computes a SHA-256 of the lock file. When dependencies change, the lock file changes, the hash changes, and the cache misses. A clean npm ci runs and creates a new cache entry. The restore-keys fallback restores the most recent cache for the same OS, which gives npm ci a partial cache to work from.
Docker Layer Caching
# HARDENED: GitHub Actions cache backend for Docker layers
- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ env.IMAGE }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
The type=gha cache backend stores Docker build layers in the GitHub Actions cache. mode=max caches all layers, not just the final stage. On subsequent builds, unchanged layers are restored from cache instead of rebuilt. This reduces build time for the checkout service from 3 minutes to 45 seconds when only application code changes (the dependency installation layer is cached).
The Gate
The cache itself is not a gate. But a poisoned cache can bypass gates. If a cache contains compromised dependencies from a previous build, and the cache key does not change when dependencies change, the build uses compromised code without triggering a dependency scan.
The gate is the cache key design. A key that includes hashFiles('package-lock.json') ensures the cache is invalidated whenever dependencies change. The dependency scan in the downstream job runs against the actual resolved dependencies, not a stale cache.
The Recovery
When a cache is suspected of being poisoned (e.g., after a supply chain incident affecting a dependency that was cached), delete the cache entries for the affected repository:
# List caches for the repository
gh cache list --repo acme/checkout-service
# Delete a specific cache by key
gh cache delete "deps-Linux-abc123def456" --repo acme/checkout-service
# Nuclear option: delete all caches
gh cache list --repo acme/checkout-service --json key --jq '.[].key' | \
xargs -I {} gh cache delete {} --repo acme/checkout-service
The next pipeline run creates fresh caches from clean dependency resolution. Chapter 4 covers how to detect compromised dependencies before they enter the cache through SBOM generation and vulnerability scanning.
Cache Key Design Rules
| Data Type | Cache Key Pattern | Invalidation Trigger |
|---|---|---|
| Node.js dependencies | deps-${{ runner.os }}-${{ hashFiles('package-lock.json') }} | Any dependency change |
| Go modules | go-${{ runner.os }}-${{ hashFiles('go.sum') }} | Any module change |
| Docker layers | type=gha (managed by BuildKit) | Any layer input change |
| Gradle build cache | gradle-${{ runner.os }}-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }} | Any build script change |
Every cache key includes the runner OS (caches are not portable across operating systems) and a hash of the file that defines the cached content. Never use a static key. Never use a timestamp. Never use the branch name alone (multiple commits on the same branch should not share a cache if dependencies changed between them).