Build Caching and Selective Testing Strategies
Build Caching and Selective Testing Strategies
The Failure
The monorepo CI installed dependencies from scratch on every run. Node.js: 90 seconds. Go modules: 60 seconds. Gradle: 120 seconds. Maven: 90 seconds. Even when the lockfiles had not changed. 6 minutes wasted on every build downloading the same dependencies.
Caching dependencies by lockfile hash reduces install time from minutes to seconds.
The Mechanism
Cache Hit Scenarios
| Scenario | Dependencies | Source | Action |
|---|---|---|---|
| No changes | Cached ✓ | Cached ✓ | Skip build |
| Source changed, deps same | Cached ✓ | Rebuild | Build with cached deps |
| Deps changed | Rebuild | Rebuild | Full rebuild |
| New CI runner | Cold ✓ | Cold ✓ | Full rebuild, populate cache |
Cache Key Strategy
Primary key: {os}-{tool}-{service}-{hash(lockfile)}
Restore keys: {os}-{tool}-{service}-
{os}-{tool}-
The restore key fallback ensures partial cache hits. If the lockfile changed but only one package was added, most of the cached modules are still valid.
The Implementation
Multi-Layer Cache Configuration
# HARDENED: Layered caching for all language stacks
jobs:
build-catalog:
steps:
- uses: actions/checkout@v4
# Layer 1: Package manager cache
- uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-npm-catalog-${{ hashFiles('services/catalog/package-lock.json') }}
restore-keys: |
${{ runner.os }}-npm-catalog-
${{ runner.os }}-npm-
# Layer 2: Build output cache
- uses: actions/cache@v4
with:
path: services/catalog/dist
key: ${{ runner.os }}-build-catalog-${{ hashFiles('services/catalog/src/**') }}
- name: Install dependencies
working-directory: services/catalog
run: npm ci --prefer-offline
- name: Build (skip if cached)
working-directory: services/catalog
run: |
if [[ -d dist ]]; then
echo "Build cache hit, skipping"
else
npm run build
fi
build-checkout:
steps:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: |
~/go/pkg/mod
~/.cache/go-build
key: ${{ runner.os }}-go-${{ hashFiles('services/checkout/go.sum') }}
restore-keys: ${{ runner.os }}-go-
- name: Build
working-directory: services/checkout
run: go build ./...
build-payments:
steps:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-${{ hashFiles('services/payments/**/*.gradle*', 'services/payments/gradle/wrapper/gradle-wrapper.properties') }}
restore-keys: ${{ runner.os }}-gradle-
- name: Build
working-directory: services/payments
run: ./gradlew build -x test
Docker Build Caching
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build with GitHub Actions cache
uses: docker/build-push-action@v5
with:
context: services/catalog
push: ${{ github.event_name == 'push' }}
tags: ghcr.io/acme/catalog-service:${{ github.sha }}
cache-from: type=gha,scope=catalog
cache-to: type=gha,mode=max,scope=catalog
Selective Test Execution
test:
needs: detect-changes
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run affected tests
run: |
AFFECTED=$(bash scripts/detect-affected.sh)
echo "Affected services: $AFFECTED"
for svc in $AFFECTED; do
echo "Testing $svc"
case $svc in
catalog)
cd services/catalog && npm test
;;
checkout)
cd services/checkout && go test ./...
;;
payments)
cd services/payments && ./gradlew test
;;
esac
cd "$GITHUB_WORKSPACE"
done
Cache Metrics
- name: Report cache stats
if: always()
run: |
echo "## Cache Statistics" >> $GITHUB_STEP_SUMMARY
echo "| Layer | Status |" >> $GITHUB_STEP_SUMMARY
echo "|-------|--------|" >> $GITHUB_STEP_SUMMARY
echo "| npm | ${{ steps.npm-cache.outputs.cache-hit && '✓ Hit' || '✗ Miss' }} |" >> $GITHUB_STEP_SUMMARY
echo "| build | ${{ steps.build-cache.outputs.cache-hit && '✓ Hit' || '✗ Miss' }} |" >> $GITHUB_STEP_SUMMARY
echo "| docker | ${{ steps.docker-cache.outputs.cache-hit && '✓ Hit' || '✗ Miss' }} |" >> $GITHUB_STEP_SUMMARY
The Gate
Build time is the implicit gate. If CI takes more than 15 minutes, developers merge without waiting. Caching keeps CI under 5 minutes for single-service changes. Track cache hit rates in the pipeline dashboard (CH18).
The Recovery
Cache grows too large: GitHub Actions cache has a 10GB limit per repository. Old caches are evicted LRU. If the limit is hit frequently, reduce cache scope or exclude large directories.
Cache poisoning: A corrupted cache causes all builds to fail. Delete the cache via the GitHub API: gh actions-cache delete --all. The next build will be slow but clean.
Selective tests miss a regression: The dependency graph was incomplete. A shared utility changed but the affected service was not detected. Add integration tests that run on all merges to main as a safety net.