Building the Locust Baseline for the Ride-Hailing Platform

The Symptom

The team knows the system is slow under load. “Slow” is not actionable. Slow where? Slow for whom? Slow compared to what? Without a repeatable load test that models real user behavior, performance conversations devolve into anecdotes.

The Cause

No load test exists. Performance is measured by the absence of complaints. When complaints arrive, the team profiles one request in dev with 50 rows in the database, finds nothing wrong, and closes the ticket. The production database has 12 million rows. The dev environment has never simulated more than 3 concurrent users.

The Baseline

This section builds the complete Locust test suite that the rest of the book references. Every file lives in load-tests/ at the repository root.

Project Structure

Load test project structure showing the directory layout with scenario files, dependencies, Docker configuration, pass/fail thresholds, and output directory

The load test suite lives in a dedicated load-tests/ directory at the repository root. The baseline_locustfile.py defines all user scenarios (rider, driver, admin), while thresholds.json codifies the pass/fail criteria that CI checks after each run. The docker-compose.locust.yml file lets you run Locust alongside the platform in a single docker compose up, and the results/ directory collects CSV output for trend analysis across runs.

The User Behavior Model

Real traffic is not uniform. During peak hours, the ride-hailing platform sees:

60% rider traffic (searching, estimating fares, viewing history)
30% driver traffic (location updates, checking ride requests)
10% admin/analytics traffic (zone stats, dashboards)

Riders are impatient. They search for drivers, request a fare estimate, and if it takes too long, they leave. Drivers send location updates every 3-5 seconds. Admin queries are infrequent but expensive.

# load-tests/baseline_locustfile.py
import random
import time
from locust import HttpUser, task, between, tag, events
from locust.runners import MasterRunner, WorkerRunner

# Realistic coordinates for the ride-hailing service area
NYC_LOCATIONS = [
    (40.7128, -74.0060),   # Lower Manhattan
    (40.7580, -73.9855),   # Midtown
    (40.7831, -73.9712),   # Upper West Side
    (40.7484, -73.9857),   # Empire State area
    (40.6892, -74.0445),   # Statue of Liberty area
    (40.7061, -74.0087),   # Tribeca
    (40.7282, -73.7949),   # Queens
    (40.6501, -73.9496),   # Brooklyn
]

RIDER_IDS = [f"rider-{i}" for i in range(1, 10001)]
DRIVER_IDS = [f"driver-{i}" for i in range(1, 5001)]


class RiderUser(HttpUser):
    weight = 6
    wait_time = between(1, 3)

    def on_start(self):
        self.rider_id = random.choice(RIDER_IDS)
        self.pickup = random.choice(NYC_LOCATIONS)
        self.dropoff = random.choice(NYC_LOCATIONS)
        while self.dropoff == self.pickup:
            self.dropoff = random.choice(NYC_LOCATIONS)

    @tag("rider", "read")
    @task(3)
    def search_drivers(self):
        self.client.get(
            "/api/drivers/nearby",
            params={
                "lat": self.pickup[0],
                "lng": self.pickup[1],
                "radius_km": 5
            },
            name="/api/drivers/nearby"
        )

    @tag("rider", "read")
    @task(2)
    def request_fare_estimate(self):
        self.client.post(
            "/api/fares/estimate",
            json={
                "pickup_lat": self.pickup[0],
                "pickup_lng": self.pickup[1],
                "dropoff_lat": self.dropoff[0],
                "dropoff_lng": self.dropoff[1]
            },
            name="/api/fares/estimate"
        )

    @tag("rider", "write")
    @task(1)
    def request_ride(self):
        self.client.post(
            "/api/rides/request",
            json={
                "rider_id": self.rider_id,
                "pickup_lat": self.pickup[0],
                "pickup_lng": self.pickup[1],
                "dropoff_lat": self.dropoff[0],
                "dropoff_lng": self.dropoff[1]
            },
            name="/api/rides/request"
        )

    @tag("rider", "read")
    @task(1)
    def view_trip_history(self):
        self.client.get(
            "/api/trips/history",
            headers={"X-User-Id": self.rider_id},
            name="/api/trips/history"
        )


class DriverUser(HttpUser):
    weight = 3
    wait_time = between(2, 5)

    def on_start(self):
        self.driver_id = random.choice(DRIVER_IDS)
        base = random.choice(NYC_LOCATIONS)
        self.lat = base[0]
        self.lng = base[1]

    @tag("driver", "write")
    @task(5)
    def update_location(self):
        # Simulate movement
        self.lat += random.uniform(-0.001, 0.001)
        self.lng += random.uniform(-0.001, 0.001)
        self.client.post(
            "/api/drivers/location",
            json={
                "driver_id": self.driver_id,
                "lat": self.lat,
                "lng": self.lng,
                "heading": random.randint(0, 359),
                "speed_kmh": random.randint(0, 60)
            },
            name="/api/drivers/location"
        )

    @tag("driver", "read")
    @task(1)
    def check_ride_requests(self):
        self.client.get(
            "/api/drivers/requests",
            headers={"X-Driver-Id": self.driver_id},
            name="/api/drivers/requests"
        )


class AdminUser(HttpUser):
    weight = 1
    wait_time = between(5, 15)

    @tag("admin", "read")
    @task(1)
    def view_zone_stats(self):
        self.client.get(
            "/api/admin/zones/stats",
            name="/api/admin/zones/stats"
        )

    @tag("admin", "read")
    @task(1)
    def view_surge_map(self):
        self.client.get(
            "/api/admin/surge/current",
            name="/api/admin/surge/current"
        )

Running the Baseline

# Start the ride-hailing platform
docker compose -f docker-compose.yml up -d

# Wait for health
until curl -s http://localhost:8080/actuator/health | grep -q '"status":"UP"'; do
    sleep 2
done

# Run Locust baseline: 200 users, 5 minutes
locust -f load-tests/baseline_locustfile.py \
    --host=http://localhost:8080 \
    --users 200 \
    --spawn-rate 10 \
    --run-time 5m \
    --headless \
    --csv=load-tests/results/baseline \
    --html=load-tests/results/baseline.html

Baseline Results

The unoptimized platform produces these numbers:

Name                       # reqs  Avg   Med   Min    Max    p95    p99    RPS   Fail%
/api/drivers/nearby         1842   145    98    12   4210    420   2100   6.14   0.0%
/api/fares/estimate         1228   210   130    18   8400    890   4200   4.09   0.2%
/api/rides/request           614   320   180    25  12000   1800   6500   2.05   0.8%
/api/trips/history           614   680   450    45  12000   2800   8400   2.05   1.1%
/api/drivers/location       1535    45    32     8    980    180    650   5.12   0.0%
/api/drivers/requests        307    62    40    10   1200    250    800   1.02   0.0%
/api/admin/zones/stats       102   890   600    80  15000   4200  12000   0.34   2.9%
/api/admin/surge/current     102   420   280    40   8000   2100   6000   0.34   1.0%
Aggregated                  6344   224   110     8  15000   1200   4800  21.15   0.4%

This table is the starting line. Save it. Print it. Pin it to the team’s wall. Every chapter that follows will change one or more of these numbers and show the new table for comparison.

What to Watch in the Output

Four things matter in every Locust result:

p99 latency per endpoint. This is the user experience for your unluckiest 1%. If p99 is above your SLO (500ms for rider-facing endpoints), the chapter has work to do.
Failure rate. A non-zero failure rate under moderate load means the system is not handling load gracefully. It is dropping requests, timing out, or returning errors. The trip history endpoint’s 1.1% failure rate at only 200 users is a problem.
RPS (requests per second). This is raw throughput. Compare it against your capacity target. If Friday evening peak is 5,000 RPS and the baseline handles 21 RPS with 200 users, you need to understand the scaling behavior before production proves it for you.
The gap between median and p99. A large gap (median 110ms, p99 4,800ms) indicates a bimodal distribution. Some requests take a fundamentally different code path. Find that path.

The Fix

The baseline itself is the fix for this section. The team now has repeatable, quantified evidence of how the system behaves under load. The next step is not optimization. The next step is Chapter 2: tracing a single request through every layer to understand where the milliseconds go.

The Proof

The proof is the table above. Run it yourself against your system. The numbers will be different, but the shape will be familiar: a few endpoints that are fast, a few that are unacceptable, and an average that hides the problem.

Run it three times. If the numbers vary by more than 15%, you have a reproducibility problem: noisy neighbor VMs, GC variance, or insufficient warm-up time. Fix the test environment before trusting the results. For this book, all results are the median of three consecutive runs with a 60-second warm-up period excluded from the statistics.

Prometheus Exporter for Locust

To feed Locust results into Grafana alongside application metrics:

# load-tests/locust_prometheus.py
from prometheus_client import start_http_server, Histogram, Counter
from locust import events

REQUEST_LATENCY = Histogram(
    "locust_request_duration_seconds",
    "Locust request latency",
    ["method", "name", "status"],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.2, 0.5, 1.0, 2.5, 5.0, 10.0]
)

REQUEST_COUNT = Counter(
    "locust_requests_total",
    "Total Locust requests",
    ["method", "name", "status"]
)


@events.request.add_listener
def on_request(request_type, name, response_time, response_length,
               response, exception, context, **kwargs):
    status = "failure" if exception else "success"
    REQUEST_LATENCY.labels(
        method=request_type, name=name, status=status
    ).observe(response_time / 1000.0)
    REQUEST_COUNT.labels(
        method=request_type, name=name, status=status
    ).inc()


@events.init.add_listener
def on_init(environment, **kwargs):
    start_http_server(9646)

Add this to the Locust invocation:

locust -f load-tests/baseline_locustfile.py,load-tests/locust_prometheus.py \
    --host=http://localhost:8080 \
    --users 200 \
    --spawn-rate 10 \
    --run-time 5m \
    --headless

Now Grafana can show Locust’s external measurements alongside the application’s internal Prometheus metrics on the same time axis. When the Locust p99 spikes, the corresponding application metric (connection pool wait time, Redis command latency, PostgreSQL query duration) spikes at the same moment, pointing directly at the bottleneck.