Skip to main content
ship it and sleep

Threshold Tuning and the Ratchet Pattern

4 min read Chapter 48 of 66

Threshold Tuning and the Ratchet Pattern

The Failure

The team enabled Trivy with exit-code: 1 on a codebase with 200 existing vulnerabilities. Every PR failed. Developers could not merge bug fixes. The security team said “fix the vulnerabilities first.” The development team said “we need to ship features.” After a week of deadlock, someone removed the exit-code and the scanner went back to advisory mode. The vulnerabilities stayed.

The ratchet pattern resolves this: accept the current baseline, but never allow new vulnerabilities. The count can only go down, never up.

The Mechanism

The Ratchet

  1. Run a full scan and record the baseline count
  2. On each PR, run the scan and compare to baseline
  3. If the count is lower than or equal to baseline → pass
  4. If the count is higher than baseline → fail
  5. When a PR fixes vulnerabilities and the count drops, update the baseline to the new lower count

The baseline file is committed to the repository. It is the ratchet: it can tighten (count goes down) but never loosen (count goes up).

Ratchet vs Fixed Threshold

ApproachExisting CodeNew CodeMigration Cost
Fixed threshold (0)Blocks all PRsBlocks correctlyMust fix all first
Ratchet (baseline)Allows existingBlocks newZero migration cost
Advisory onlyNo blockingNo blockingZero, but no protection

The Implementation

Baseline File

// .security-baseline.json
// HARDENED: Ratchet baseline - count can only decrease
{
  "trivy": {
    "critical": 0,
    "high": 12,
    "medium": 45,
    "lastUpdated": "2025-01-15",
    "updatedBy": "security-scan-bot"
  },
  "codeql": {
    "errors": 3,
    "warnings": 28,
    "lastUpdated": "2025-01-15"
  }
}

Ratchet Script

#!/bin/bash
# scripts/security-ratchet.sh
# HARDENED: Fail if vulnerability count increases from baseline
set -euo pipefail

BASELINE_FILE=".security-baseline.json"
SCAN_RESULTS="trivy-results.json"

# Run Trivy and get counts
trivy fs --format json --output "$SCAN_RESULTS" --severity CRITICAL,HIGH .

CURRENT_CRITICAL=$(jq '[.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL")] | length' "$SCAN_RESULTS")
CURRENT_HIGH=$(jq '[.Results[]?.Vulnerabilities[]? | select(.Severity == "HIGH")] | length' "$SCAN_RESULTS")

BASELINE_CRITICAL=$(jq '.trivy.critical' "$BASELINE_FILE")
BASELINE_HIGH=$(jq '.trivy.high' "$BASELINE_FILE")

echo "Critical: $CURRENT_CRITICAL (baseline: $BASELINE_CRITICAL)"
echo "High: $CURRENT_HIGH (baseline: $BASELINE_HIGH)"

FAILED=0

if [[ "$CURRENT_CRITICAL" -gt "$BASELINE_CRITICAL" ]]; then
  echo "::error::Critical vulnerabilities increased: $CURRENT_CRITICAL > $BASELINE_CRITICAL"
  FAILED=1
fi

if [[ "$CURRENT_HIGH" -gt "$BASELINE_HIGH" ]]; then
  echo "::error::High vulnerabilities increased: $CURRENT_HIGH > $BASELINE_HIGH"
  FAILED=1
fi

# Auto-tighten: if count decreased, update baseline
if [[ "$CURRENT_CRITICAL" -lt "$BASELINE_CRITICAL" || "$CURRENT_HIGH" -lt "$BASELINE_HIGH" ]]; then
  echo "Vulnerabilities decreased. Updating baseline."
  jq --argjson c "$CURRENT_CRITICAL" --argjson h "$CURRENT_HIGH" \
    '.trivy.critical = $c | .trivy.high = $h | .trivy.lastUpdated = (now | todate)' \
    "$BASELINE_FILE" > tmp.json && mv tmp.json "$BASELINE_FILE"

  # The updated baseline is committed by the CI bot
  echo "BASELINE_UPDATED=true" >> "$GITHUB_ENV"
fi

exit $FAILED

CI Integration

# .github/workflows/security.yml
- name: Security ratchet check
  run: bash scripts/security-ratchet.sh

- name: Commit updated baseline
  if: env.BASELINE_UPDATED == 'true'
  run: |
    git config user.name "security-bot"
    git config user.email "[email protected]"
    git add .security-baseline.json
    git commit -m "chore: tighten security baseline"
    git push

Exception Workflow

When a vulnerability cannot be fixed immediately (no patch available, upstream issue):

// .security-exceptions.json
// HARDENED: Tracked exceptions with mandatory expiration
{
  "exceptions": [
    {
      "cve": "CVE-2024-99999",
      "severity": "HIGH",
      "reason": "No upstream fix available. Mitigated by WAF rule.",
      "trackingIssue": "https://github.com/acme/checkout-service/issues/456",
      "addedBy": "[email protected]",
      "addedDate": "2025-01-15",
      "expiresDate": "2025-04-15",
      "reviewed": true
    }
  ]
}

The ratchet script accounts for exceptions when comparing counts. Expired exceptions are automatically removed and the vulnerability counts again.

The Gate

The ratchet is the gate. It combines two properties:

  1. Never worse: New vulnerabilities are always blocked
  2. Eventually better: Every fix tightens the baseline permanently

Over time, the baseline converges toward zero without ever blocking existing work.

The Recovery

Baseline drift between branches: The baseline file can conflict when multiple branches fix different vulnerabilities. Use jq to take the minimum of each count during merge conflict resolution.

Auto-tighten creates noisy commits: Move baseline updates to a scheduled job instead of per-PR. Run nightly, compare current scan to baseline, and tighten.

New service starts with high baseline: Set a policy: new services must start with a baseline of zero. The ratchet only applies to legacy services.