From Database to Response: The Second Half and Where It Stalls

The Symptom

The trip history endpoint averages 680ms at 200 concurrent users. The PostgreSQL query takes 45ms according to pg_stat_statements. The application logs show the Hibernate/R2DBC query completing in 48ms. Where are the other 632ms?

The Cause

The missing time is in three places: connection pool wait, response serialization, and TCP transfer. Under load, connection pool wait dominates.

Connection Pool Wait vs Query Execution

HikariCP (for JDBC) and R2DBC pool both have a maximum number of connections. When all connections are in use, new requests wait in a queue. The wait time is not logged as query time. It is not logged at all unless you instrument it.

// BOTTLENECK: The time gap between "request arrives" and "query starts"
// This gap grows under load and is invisible without metrics

// With HikariCP, the connection acquisition time is a separate metric:
// hikaricp.connections.acquire (Timer)
// hikaricp.connections.active (Gauge)
// hikaricp.connections.pending (Gauge)

// When pending > 0, requests are queuing for connections

At 200 concurrent users with a connection pool of 10:

Query execution time: 45ms
Connection wait time (pool exhausted): 400-1,200ms
Time the request “wastes” waiting for a database connection it could use: 90% of total latency

The Prometheus query that reveals this:

# Connection acquisition time at p99
histogram_quantile(0.99,
  sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le)
)
# Result during load: 1.2 seconds

# Active connections (should not be pinned at max)
hikaricp_connections_active{pool="HikariPool-1"}
# Result during load: 10 (constantly at max, all connections busy)

# Pending threads waiting for a connection
hikaricp_connections_pending{pool="HikariPool-1"}
# Result during load: 45 (45 requests queued, waiting)

Redis Cache Hit vs Cache Miss Path

The fare estimate endpoint has two latency profiles:

Cache hit path:
  Redis GET           → 2ms
  JSON serialization  → 1ms
  Network return      → 3ms
  Total               → 6ms

Cache miss path:
  Redis GET (miss)    → 2ms
  PostgreSQL query    → 35ms
  Surge HTTP call     → 50ms (parallel with PG if reactive)
  Compute fare        → 5ms
  Redis SET           → 2ms
  JSON serialization  → 1ms
  Network return      → 3ms
  Total (reactive)    → 58ms
  Total (blocking)    → 98ms

Under sustained load with a 60-second cache TTL and 8 pricing zones, approximately 0.13 cache misses per second per zone occur. With 8 zones, that is roughly 1 cache miss per second. At 2,000 RPS, 99.95% of requests take the 6ms path. The p99 is still dominated by cache misses because the 0.05% of requests that miss the cache bunch up when multiple zone caches expire simultaneously.

Response Serialization

The trip history endpoint returns a list of past trips. Each trip includes pickup/dropoff coordinates, fare breakdown, driver info, and timestamps. A rider with 200 past trips receives a JSON response of approximately 180KB.

// BOTTLENECK: Returning entire trip history as one JSON payload
@GetMapping("/api/trips/history")
public Flux<Trip> getTripHistory(
        @RequestHeader("X-User-Id") String userId) {
    return tripRepository.findByRiderId(userId)
        .sort(Comparator.comparing(Trip::completedAt).reversed());
}
// Generated SQL:
// SELECT * FROM trips WHERE rider_id = ? ORDER BY completed_at DESC
// Returns all 200 trips with all columns

Serialization of 200 Trip objects with Jackson takes 12-25ms depending on GC pressure. The response body of 180KB takes 15-45ms to transmit over a typical mobile connection. At 3G speeds (1 Mbps), 180KB takes 1.4 seconds.

// SCALED: Paginated response with sparse fields
@GetMapping("/api/trips/history")
public Mono<TripHistoryResponse> getTripHistory(
        @RequestHeader("X-User-Id") String userId,
        @RequestParam(defaultValue = "0") int page,
        @RequestParam(defaultValue = "20") int size) {
    return tripRepository.findByRiderId(userId, PageRequest.of(page, size))
        .map(trip -> new TripSummary(
            trip.id(),
            trip.pickupAddress(),
            trip.dropoffAddress(),
            trip.fareTotal(),
            trip.completedAt()
        ))
        .collectList()
        .map(trips -> new TripHistoryResponse(trips, page, size));
}
// Generated SQL:
// SELECT id, pickup_address, dropoff_address, fare_total, completed_at
// FROM trips WHERE rider_id = ? ORDER BY completed_at DESC
// LIMIT 20 OFFSET 0
// Returns 20 trips with 5 columns instead of 200 trips with 15 columns

The paginated response is approximately 4KB. Serialization takes < 1ms. Transfer takes < 10ms on 3G.

The Baseline

Locust test isolating the trip history endpoint with pagination disabled vs enabled:

# load-tests/trip_history_locustfile.py
from locust import HttpUser, task, between

class TripHistoryUser(HttpUser):
    wait_time = between(0.5, 1.5)

    @task
    def view_full_history(self):
        """Unpaginated: returns all trips."""
        self.client.get(
            "/api/trips/history",
            headers={"X-User-Id": "rider-heavy-user"},
            name="/api/trips/history [full]"
        )

    @task
    def view_paginated_history(self):
        """Paginated: returns 20 trips."""
        self.client.get(
            "/api/trips/history",
            params={"page": 0, "size": 20},
            headers={"X-User-Id": "rider-heavy-user"},
            name="/api/trips/history [page]"
        )

Results at 200 concurrent users:

Name                           p50    p95     p99     RPS   Fail%
/api/trips/history [full]      450   2800    8400    2.05   1.1%
/api/trips/history [page]       35    120     280   18.40   0.0%

The Fix

Three changes, applied together:

Paginate the response (shown above): reduces payload from 180KB to 4KB
Select only needed columns: reduces query I/O from 15 columns to 5
Enable response compression: reduces 4KB to ~1.2KB over the wire

# application.yml
server:
  compression:
    enabled: true
    mime-types: application/json
    min-response-size: 1024

The Proof

After all three changes:

Name                           p50    p95     p99     RPS   Fail%
/api/trips/history [before]    450   2800    8400    2.05   1.1%
/api/trips/history [after]      28     85     180   24.60   0.0%

Delta:
  p50:  450ms → 28ms   (16x improvement)
  p99: 8400ms → 180ms  (46x improvement)
  RPS: 2.05 → 24.60    (12x throughput)
  Fail: 1.1% → 0.0%

The trip history endpoint went from the worst-performing endpoint in the baseline to one of the fastest. The fix was not a database optimization or a caching layer. It was returning less data. The cheapest performance optimization is not doing work that nobody asked for.

Bottleneck Migration Under Load

To demonstrate how bottlenecks migrate, run the step-load Locust test from Chapter 2’s main section and observe which Prometheus metric degrades at each step:

Users	Bottleneck	Evidence
50	PostgreSQL query	`pg_stat_statements.mean_exec_time` is 50% of total latency
100	Mix of query + pool wait	`hikaricp_connections_pending` starts showing non-zero values
200	Connection pool	`hikaricp_connections_acquire` p99 exceeds 500ms
300	Connection pool + event loop	`reactor_scheduler_tasks_pending` starts climbing
500	Everything queuing	Load balancer queue depth > 100, all downstream metrics degraded

The migration pattern is predictable: the component with the smallest capacity saturates first. For the ride-hailing platform, that is the PostgreSQL connection pool (10 connections). Chapter 4 fixes this. But fixing it just moves the bottleneck to the next constrained resource. Scaling is not about removing bottlenecks. It is about ensuring no single bottleneck dominates and that each layer degrades gracefully under load.