Knative Serving Configuration, Scale-to-Zero, and Cold Start Budget
Knative Serving Configuration, Scale-to-Zero, and Cold Start Budget
The Failure
The team deployed the report generation service on Knative with default settings. The service used a Java 21 Spring Boot container that took 12 seconds to start. The default scale-to-zero-grace-period was 30 seconds, so after 30 seconds of no traffic the pod was terminated. A vendor would request a report, wait 15 seconds for the cold start, and then request another report 45 seconds later—triggering another cold start. The service scaled to zero between every single request.
The fix: increase scale-to-zero-pod-retention-period to match the service’s expected request interval, and optimize the container for faster startup.
The Mechanism
Autoscaler Parameters
| Parameter | Default | Description |
|---|---|---|
scale-to-zero-grace-period | 30s | Global: time after last request before scale-to-zero begins |
scale-to-zero-pod-retention-period | 0s | Per-revision: minimum time to keep last pod alive |
target | 100 | Concurrent requests per pod before scaling up |
target-utilization-percentage | 70% | Scale up when pod reaches this % of target |
min-scale | 0 | Minimum replicas (0 enables scale-to-zero) |
max-scale | 0 (unlimited) | Maximum replicas |
initial-scale | 1 | Replicas on first deployment |
scale-down-delay | 0s | Delay before scaling down after load decreases |
metric | concurrency | Metric type: concurrency or rps |
Container Startup Optimization
The cold start budget has three components:
Cold Start = Image Pull + Container Init + App Startup + Readiness Probe
Each can be optimized independently:
| Component | Optimization |
|---|---|
| Image pull | Pre-pull images (DaemonSet), use small base images, image caching |
| Container init | Avoid init containers, minimize volume mounts |
| App startup | AOT compilation, lazy loading, fast frameworks (Quarkus, Go) |
| Readiness probe | Short initialDelaySeconds, fast health endpoint |
The Implementation
Service Profile Configurations
# Profile: Low-traffic API (product import, report generation)
# HARDENED: Optimized for infrequent traffic with acceptable cold start
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: product-import
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/min-scale: "0"
autoscaling.knative.dev/max-scale: "5"
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "15m"
autoscaling.knative.dev/scale-down-delay: "5m"
spec:
containerConcurrency: 10
timeoutSeconds: 300
containers:
- image: ghcr.io/acme/product-import:abc123
# Profile: Scheduled batch job (nightly reports)
# HARDENED: Scale-to-zero quickly, accept cold start
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: report-generator
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/min-scale: "0"
autoscaling.knative.dev/max-scale: "3"
autoscaling.knative.dev/target: "1"
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "2m"
spec:
containerConcurrency: 1
timeoutSeconds: 600
containers:
- image: ghcr.io/acme/report-generator:abc123
Measuring Cold Start
# Measure cold start: ensure service is at zero, then time first request
# Step 1: Verify scale-to-zero
kubectl get pods -n production -l serving.knative.dev/service=product-import
# No pods should be listed
# Step 2: Time the first request
time curl -s -o /dev/null -w "%{http_code} %{time_total}s" \
https://product-import.production.example.com/health
# Step 3: Check pod startup events
kubectl get events -n production --sort-by='.lastTimestamp' \
--field-selector reason=Started | tail -5
Multi-Container Optimization (Init Before Serve)
# HARDENED: Multi-stage build for minimal cold start
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server .
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
Revision Management
# List revisions for a Knative Service
kubectl get revisions -n production -l serving.knative.dev/service=product-import
# Pin traffic to a specific revision (rollback)
kubectl patch ksvc product-import -n production --type merge -p '
spec:
traffic:
- revisionName: product-import-00005
percent: 100
'
# Split traffic between revisions (canary)
kubectl patch ksvc product-import -n production --type merge -p '
spec:
traffic:
- revisionName: product-import-00005
percent: 90
- revisionName: product-import-00006
percent: 10
'
The Gate
Cold start budget is the gate. Define a maximum acceptable cold start time for each service profile. If the cold start exceeds the budget after container optimization, the service should not use scale-to-zero. Set min-scale: 1.
| Service Profile | Cold Start Budget | Action if Exceeded |
|---|---|---|
| Background processor | 30s | Acceptable, no change |
| Internal API | 5s | Optimize container or min-scale: 1 |
| User-facing API | 2s | min-scale: 1 or use standard Deployment |
The Recovery
Service keeps scaling to zero between requests: Increase scale-to-zero-pod-retention-period. Set it to 2-3x the expected gap between requests.
Cold start exceeds budget after optimization: Set min-scale: 1. The service keeps one warm pod at all times. You lose scale-to-zero but gain consistent latency.
Old revisions consume resources: Knative keeps old revisions. Set revisionHistoryLimit in the Knative global config to limit retained revisions. Alternatively, clean up with kubectl delete revision.