Regression Detection and PR Reporting

The Symptom

A developer adds a date formatting library to the checkout page. The library weighs 14KB gzipped. Lighthouse CI still passes because the total JavaScript budget is 350KB and the checkout bundle is at 54KB. But the LCP on the checkout page increases by 180ms because the library is imported synchronously and executes during the critical rendering path. The budget check passes. The regression ships.

Budget thresholds catch large regressions. Trend detection catches the gradual ones.

The Cause

Performance budgets are binary: pass or fail. They do not track direction. A bundle growing from 54KB to 68KB is a 26% increase that stays within budget. Five such increments over five sprints push the bundle to 124KB, well past budget, but each individual change passed the gate.

Regression detection compares the current PR against the most recent passing baseline and flags any metric that moved in the wrong direction beyond a noise threshold.

The Baseline

The e-commerce platform’s checkout bundle at baseline:

Gzipped size: 54.1 kB
LCP (Lighthouse CI, throttled): 1,840ms
TBT: 180ms
CLS: 0.02

After the date formatting library addition:

Gzipped size: 68.3 kB (+14.2 kB, +26%)
LCP: 2,020ms (+180ms, +9.8%)
TBT: 220ms (+40ms, +22%)
CLS: 0.02 (unchanged)

All metrics still pass the budgets. But LCP moved 180ms in the wrong direction. Over time, five such changes would push LCP past 2,500ms.

The Fix

A custom GitHub Actions step that compares Lighthouse results against a stored baseline:

// scripts/performance-report.ts
import * as fs from "fs";

interface LighthouseResult {
  url: string;
  lcp: number;
  tbt: number;
  cls: number;
  si: number;
  totalByteWeight: number;
  scriptByteWeight: number;
}

interface RegressionReport {
  url: string;
  metric: string;
  baseline: number;
  current: number;
  delta: number;
  deltaPercent: number;
  severity: "info" | "warning" | "error";
}

const THRESHOLDS: Record<
  string,
  { warning: number; error: number; unit: string }
> = {
  lcp: { warning: 100, error: 300, unit: "ms" },
  tbt: { warning: 50, error: 150, unit: "ms" },
  cls: { warning: 0.02, error: 0.05, unit: "" },
  totalByteWeight: { warning: 20000, error: 50000, unit: "bytes" },
  scriptByteWeight: { warning: 10000, error: 30000, unit: "bytes" },
};

function computeRegressions(
  baseline: LighthouseResult[],
  current: LighthouseResult[],
): RegressionReport[] {
  const reports: RegressionReport[] = [];

  for (const cur of current) {
    const base = baseline.find((b) => b.url === cur.url);
    if (!base) continue;

    for (const [metric, threshold] of Object.entries(THRESHOLDS)) {
      const baseVal = base[metric as keyof LighthouseResult] as number;
      const curVal = cur[metric as keyof LighthouseResult] as number;
      const delta = curVal - baseVal;

      if (delta <= 0) continue; // Improvement, skip

      const deltaPercent = baseVal > 0 ? (delta / baseVal) * 100 : 0;
      let severity: "info" | "warning" | "error" = "info";

      if (delta >= threshold.error) {
        severity = "error";
      } else if (delta >= threshold.warning) {
        severity = "warning";
      }

      if (severity !== "info") {
        reports.push({
          url: cur.url,
          metric,
          baseline: baseVal,
          current: curVal,
          delta,
          deltaPercent,
          severity,
        });
      }
    }
  }

  return reports;
}

function formatReport(regressions: RegressionReport[]): string {
  if (regressions.length === 0) {
    return "## Performance Report\n\n✅ No performance regressions detected.\n";
  }

  const errors = regressions.filter((r) => r.severity === "error");
  const warnings = regressions.filter((r) => r.severity === "warning");

  let md = "## Performance Report\n\n";

  if (errors.length > 0) {
    md += "### ❌ Regressions (blocking)\n\n";
    md += "| Page | Metric | Baseline | Current | Delta |\n";
    md += "|------|--------|----------|---------|-------|\n";
    for (const r of errors) {
      const unit = THRESHOLDS[r.metric]?.unit ?? "";
      md += `| ${r.url} | ${r.metric} | ${r.baseline}${unit} | ${r.current}${unit} | +${r.delta}${unit} (+${r.deltaPercent.toFixed(1)}%) |\n`;
    }
    md += "\n";
  }

  if (warnings.length > 0) {
    md += "### ⚠️ Warnings (non-blocking)\n\n";
    md += "| Page | Metric | Baseline | Current | Delta |\n";
    md += "|------|--------|----------|---------|-------|\n";
    for (const r of warnings) {
      const unit = THRESHOLDS[r.metric]?.unit ?? "";
      md += `| ${r.url} | ${r.metric} | ${r.baseline}${unit} | ${r.current}${unit} | +${r.delta}${unit} (+${r.deltaPercent.toFixed(1)}%) |\n`;
    }
    md += "\n";
  }

  return md;
}

// Main execution
const baselinePath = process.argv[2];
const currentPath = process.argv[3];

const baseline: LighthouseResult[] = JSON.parse(
  fs.readFileSync(baselinePath, "utf-8"),
);
const current: LighthouseResult[] = JSON.parse(
  fs.readFileSync(currentPath, "utf-8"),
);

const regressions = computeRegressions(baseline, current);
const report = formatReport(regressions);

fs.writeFileSync("performance-report.md", report);

const hasErrors = regressions.some((r) => r.severity === "error");
process.exit(hasErrors ? 1 : 0);

The GitHub Actions integration:

regression-check:
  needs: lighthouse
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4

    - name: Download current results
      uses: actions/download-artifact@v4
      with:
        name: lighthouse-results
        path: current-results/

    - name: Download baseline
      uses: actions/download-artifact@v4
      with:
        name: lighthouse-baseline
        path: baseline/
      continue-on-error: true

    - name: Extract metrics
      run: |
        node scripts/extract-lighthouse-metrics.js \
          baseline/results.json > baseline-metrics.json
        node scripts/extract-lighthouse-metrics.js \
          current-results/results.json > current-metrics.json

    - name: Check for regressions
      run: npx tsx scripts/performance-report.ts baseline-metrics.json current-metrics.json

    - name: Post PR comment
      if: always() && github.event_name == 'pull_request'
      uses: marocchino/sticky-pull-request-comment@v2
      with:
        path: performance-report.md
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

    - name: Update baseline on main
      if: github.ref == 'refs/heads/main'
      run: |
        cp current-metrics.json baseline-metrics.json
        # Store baseline as artifact for future comparisons

The Proof

After deploying this pipeline, the date formatting library PR would have produced:

## Performance Report

### ⚠️ Warnings (non-blocking)

| Page | Metric | Baseline | Current | Delta |
|------|--------|----------|---------|-------|
| /checkout | lcp | 1840ms | 2020ms | +180ms (+9.8%) |
| /checkout | scriptByteWeight | 54100bytes | 68300bytes | +14200bytes (+26.2%) |

The developer sees the 180ms LCP regression and the 14.2KB script size increase before the PR merges. The warning is non-blocking (below the error threshold), but visible. The reviewer can ask: “Is a date formatting library worth 180ms of LCP on the checkout page?”

In the three months after deploying this pipeline on the e-commerce platform, it caught 7 bundle size regressions and 3 LCP regressions before they reached production. The largest was a 42KB bundle increase from an accidental full import of a charting library that would have added 800ms to the dashboard page LCP. Total developer time spent investigating and fixing these 10 regressions: approximately 4 hours. Estimated time that would have been spent diagnosing the same regressions in production: 15+ hours plus user impact.

The Trade-off

The regression detection adds complexity to the CI pipeline. Baseline management requires storing and retrieving artifacts across workflow runs. GitHub Actions artifact retention defaults to 90 days, which means baselines older than 90 days are lost and must be re-established.

False positives are the operational cost. Lighthouse CI has natural variance of ~50ms on LCP between runs. Setting the warning threshold below this variance triggers false positive warnings. The 100ms LCP warning threshold used here was determined empirically: it fires on genuine regressions and stays quiet on noise. Your variance will differ depending on the CI runner hardware and the complexity of the tested pages. Run 10 identical builds and compute the standard deviation of each metric to calibrate your thresholds.

The pipeline time cost is the other factor. The full performance pipeline adds 5 minutes to every PR. For the e-commerce platform with 15-20 PRs per day, that is 75-100 minutes of additional CI compute daily. At GitHub Actions pricing, this costs approximately $2-4 per day. Compare this to the cost of a single performance regression in production.