KEDA, Event-Driven Scaling, and the Metrics That Matter

The Symptom

The trip analytics consumer processes driver location updates, fare calculations, and ride completions. Each event triggers an aggregation pipeline that updates real-time dashboards. During normal hours, 2 consumer pods handle 400 events/second with 50ms processing time per event. Consumer lag hovers around 100 messages.

Friday at 6:15 PM, trip volume triples. The event production rate jumps from 400 to 1,400 events/second. The 2 consumers still process 400 events/second total. The lag climbs: 1,000 per second of deficit times 60 seconds = 60,000 messages of lag in the first minute. By 6:47 PM, lag reaches 847,000 messages.

The real-time dashboard shows data from 32 minutes ago. The operations team sees surge pricing decisions based on stale demand data. Drivers are dispatched to areas where demand peaked half an hour ago. Riders in currently high-demand areas see no surge pricing because the analytics pipeline has not processed the current data.

There is no autoscaler watching this. HPA does not know Kafka exists. The consumer pods’ CPU is at 35%. Memory is at 28%. From Kubernetes’ perspective, everything is fine.

The Cause

HPA scales on Kubernetes-native metrics: CPU, memory, and custom metrics exposed via the metrics API. Kafka consumer lag is not a Kubernetes metric. It lives in the Kafka broker’s consumer group coordinator.

KEDA bridges this gap. It is a Kubernetes operator that creates and manages HPA resources based on external event sources. KEDA’s architecture:

Kafka Broker                KEDA                    Kubernetes
+-----------+     query     +------------+   create   +-----+
| Consumer  | <------------ | ScaledObj  | --------> | HPA |
| Group Lag |               | Controller |            +-----+
+-----------+               +------------+               |
                                  |                       v
                            +----------+          +-----------+
                            | Metrics  |          | Deployment|
                            | Adapter  |          | Scale     |
                            +----------+          +-----------+

KEDA polls the external metric source (Kafka, Prometheus, RabbitMQ, and 50+ others) and exposes the metric to Kubernetes as an external metric. It then creates or patches an HPA resource that targets the deployment and uses the external metric for scaling decisions.

The scaling formula for the Kafka trigger:

desiredReplicas = ceil(currentLag / lagThreshold)

With a current lag of 847,000 and a lagThreshold of 1,000:

desiredReplicas = ceil(847000 / 1000) = 847

Capped at maxReplicaCount: 20 replicas. KEDA scales to 20 consumers, each processing ~70 events/second. Combined throughput: 1,400 events/second. Production rate matches consumption rate. Lag starts decreasing.

One critical detail: Kafka partitions limit parallelism. If the topic has 16 partitions, scaling to 20 consumers wastes 4 pods. 4 consumers sit idle because there are no partitions for them. The maxReplicaCount must not exceed the partition count.

The Baseline

Current state without KEDA:

Metric                          Normal Hours    Friday Peak
Event production rate           400 eps         1,400 eps
Consumer replicas               2               2
Consumer throughput             400 eps         400 eps
Consumer lag                    ~100            847,000
Dashboard data staleness        <1s             32 min
Surge pricing accuracy          >95%            <40%

Target state with KEDA:

Metric                          Normal Hours    Friday Peak
Event production rate           400 eps         1,400 eps
Consumer replicas               2               15
Consumer throughput             400 eps         1,400 eps
Consumer lag                    ~100            <1,000
Dashboard data staleness        <1s             <3s
Surge pricing accuracy          >95%            >95%

The Fix

KEDA Kafka trigger for the trip analytics consumer

Install KEDA in the cluster:

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Create the ScaledObject:

# SCALED: KEDA ScaledObject for trip analytics consumer
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: trip-analytics-consumer
  namespace: ridehailing
spec:
  scaleTargetRef:
    name: trip-analytics-consumer
  minReplicaCount: 2
  maxReplicaCount: 20
  cooldownPeriod: 120
  pollingInterval: 15
  advanced:
    restoreToOriginalReplicaCount: false
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
            - type: Pods
              value: 5
              periodSeconds: 30
        scaleDown:
          stabilizationWindowSeconds: 120
          policies:
            - type: Percent
              value: 25
              periodSeconds: 60
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-headless.kafka:9092
        consumerGroup: trip-analytics
        topic: trip-events
        lagThreshold: "1000"
        offsetResetPolicy: earliest
        allowIdleConsumers: "false"
        excludePersistentLag: "false"

Key configuration choices:

pollingInterval: 15 checks Kafka lag every 15 seconds. Faster polling (5s) causes flapping when lag oscillates around the threshold. Slower polling (60s) delays scaling by a full minute.

cooldownPeriod: 120 waits 2 minutes after the last scale event before scaling down. Friday surges have 5-10 minute lulls between peaks. Scaling down during a lull and back up during the next peak wastes startup time and increases lag.

allowIdleConsumers: "false" prevents scaling beyond the partition count. With 20 partitions on the trip-events topic, this caps effective scaling at 20. If the topic had 16 partitions, only 16 consumers would get assignments; the remaining 4 would idle.

KEDA Prometheus trigger for the driver matching service

The driver matching service receives HTTP requests to find available drivers for a ride request. During surge, pending match requests queue up. A Prometheus metric tracks the queue depth:

// SCALED: Prometheus metric for pending match requests
@Component
public class MatchingMetrics {

    private final AtomicInteger pendingMatches;
    private final MeterRegistry registry;

    public MatchingMetrics(MeterRegistry registry) {
        this.registry = registry;
        this.pendingMatches = registry.gauge(
            "matching_pending_requests",
            new AtomicInteger(0)
        );
    }

    public void incrementPending() {
        pendingMatches.incrementAndGet();
    }

    public void decrementPending() {
        pendingMatches.decrementAndGet();
    }
}

# SCALED: KEDA Prometheus trigger for matching service
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: driver-matching-scaler
  namespace: ridehailing
spec:
  scaleTargetRef:
    name: driver-matching
  minReplicaCount: 3
  maxReplicaCount: 30
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        metricName: matching_pending_total
        query: |
          sum(matching_pending_requests{namespace="ridehailing"})
        threshold: "100"
        activationThreshold: "10"

activationThreshold: "10" prevents scaling from zero when a trivial number of requests are pending. The threshold of 100 means KEDA targets one replica per 100 pending requests. At 500 pending requests: ceil(500/100) = 5 replicas.

Scale-to-zero for batch workloads

The nightly fare reconciliation job consumes events from a fare-reconciliation topic. During the day, the topic is empty. At midnight, a batch job publishes the day’s fare adjustments. Running consumer pods 24 hours for a workload that has events for 45 minutes is waste.

# SCALED: Scale-to-zero for fare reconciliation consumer
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: fare-reconciliation-consumer
  namespace: ridehailing
spec:
  scaleTargetRef:
    name: fare-reconciliation
  minReplicaCount: 0
  maxReplicaCount: 10
  cooldownPeriod: 300
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-headless.kafka:9092
        consumerGroup: fare-reconciliation
        topic: fare-reconciliation-events
        lagThreshold: "500"
        activationLagThreshold: "1"

minReplicaCount: 0 enables scale-to-zero. When no messages are pending, KEDA scales the deployment to 0 replicas. Zero pods running, zero resources consumed.

activationLagThreshold: "1" triggers the first scale-up when even a single message appears. The deployment goes from 0 to 1 replica as soon as the batch producer starts publishing. From there, normal lag-based scaling kicks in: at 5,000 pending messages, KEDA scales to ceil(5000/500) = 10 replicas.

The coldstart penalty: going from 0 to 1 takes 30-60 seconds (image pull, JVM startup, Kafka consumer group rebalance). For a nightly batch job, 60 seconds of startup delay is acceptable. For a latency-sensitive service, scale-to-zero is the wrong choice.

Combining HPA and KEDA

The ride-hailing platform uses both:

Service                    Scaler    Metric                   Reason
rider-api                  HPA       HTTP requests/sec        Request-driven, synchronous
trip-analytics-consumer    KEDA      Kafka consumer lag        Event-driven, async
driver-matching            KEDA      Prometheus pending count  Queue depth
fare-reconciliation        KEDA      Kafka lag + zero-scale   Batch, cost optimization
surge-pricing-calc         VPA       Memory recommendation     Memory-bound, stateful

HPA and KEDA do not conflict because they target different deployments. Within a single deployment, use only one: KEDA creates its own HPA resource. Running both on the same deployment causes the two HPA resources to fight.

Locust: Friday evening scaling timeline

# SCALED: Locust simulating Friday evening event surge
from locust import HttpUser, task, between, events
import time, json

class AnalyticsEventProducer(HttpUser):
    """Simulates Kafka producer load by sending events via HTTP ingestion endpoint"""
    wait_time = between(0.05, 0.2)

    @task(5)
    def driver_location_update(self):
        self.client.post("/api/events/driver-location", json={
            "driver_id": f"driver-{self.environment.runner.user_count % 5000}",
            "lat": 40.7128 + (self.environment.runner.user_count % 100) * 0.001,
            "lng": -74.0060 + (self.environment.runner.user_count % 100) * 0.001,
            "timestamp": int(time.time() * 1000)
        })

    @task(3)
    def trip_completion(self):
        self.client.post("/api/events/trip-complete", json={
            "trip_id": f"trip-{int(time.time() * 1000)}",
            "fare": 24.50,
            "distance_km": 8.3,
            "duration_min": 22
        })

    @task(1)
    def surge_update(self):
        self.client.post("/api/events/surge-update", json={
            "zone_id": f"zone-{self.environment.runner.user_count % 200}",
            "multiplier": 1.8,
            "demand_score": 85
        })

Run the ramp:

locust -f locust_keda_test.py \
  --host=https://event-ingestion.ridehailing.internal \
  --users 5000 \
  --spawn-rate 100 \
  --run-time 600s \
  --headless \
  --csv=keda_scaling_test

Expected scaling timeline:

Time    Event Rate    Consumer Lag    KEDA Replicas    Action
T+0     400 eps       100             2                Baseline
T+30    800 eps       12,000          2                Lag building
T+45    1,000 eps     24,000          2→5              KEDA first scale
T+60    1,200 eps     35,000          5→8              KEDA second scale
T+90    1,400 eps     28,000          8→12             Lag stabilizing
T+120   1,400 eps     8,000           12→15            Catching up
T+180   1,400 eps     980             15               Steady state
T+360   800 eps       200             15               Cool-down wait
T+480   400 eps       100             15→8             Scale-down starts
T+600   400 eps       80              8→4              Scale-down continues

KEDA’s first scale event fires at T+45 when lag crosses 1,000. By T+180, 15 consumers have caught up with the 1,400 eps production rate. Lag stabilizes below 1,000 messages. The cooldown period (120s) prevents premature scale-down during brief traffic dips.

The Proof

After deploying KEDA for the analytics pipeline:

Metric                    Before KEDA    After KEDA     Delta
Peak consumer lag          847,000        980            -99.9%
Dashboard staleness (peak) 32 min         <3s            -99.8%
Surge pricing accuracy     <40%           >95%           +137%
Consumer pods (peak)       2              15             +650%
Consumer pods (off-peak)   2              2              No change
Fare recon pods (idle)     2 (24/7)       0              -100%
Monthly compute cost       $840           $620           -26%

The compute cost decreased despite running more pods during peak. The savings come from scale-to-zero on the fare reconciliation consumers (eliminated 2 pods running 23 hours/day with no work) and from KEDA scaling down the analytics consumers during off-peak hours instead of running a fixed over-provisioned fleet.

The dashboard now shows real-time data during Friday surge. Surge pricing reacts to current demand, not 32-minute-old demand. Drivers receive dispatch to areas where riders are requesting rides now, not where they were requesting rides half an hour ago.