From Database to Response: The Second Half and Where It Stalls
From Database to Response: The Second Half and Where It Stalls
The Symptom
The trip history endpoint averages 680ms at 200 concurrent users. The PostgreSQL query takes 45ms according to pg_stat_statements. The application logs show the Hibernate/R2DBC query completing in 48ms. Where are the other 632ms?
The Cause
The missing time is in three places: connection pool wait, response serialization, and TCP transfer. Under load, connection pool wait dominates.
Connection Pool Wait vs Query Execution
HikariCP (for JDBC) and R2DBC pool both have a maximum number of connections. When all connections are in use, new requests wait in a queue. The wait time is not logged as query time. It is not logged at all unless you instrument it.
// BOTTLENECK: The time gap between "request arrives" and "query starts"
// This gap grows under load and is invisible without metrics
// With HikariCP, the connection acquisition time is a separate metric:
// hikaricp.connections.acquire (Timer)
// hikaricp.connections.active (Gauge)
// hikaricp.connections.pending (Gauge)
// When pending > 0, requests are queuing for connections
At 200 concurrent users with a connection pool of 10:
- Query execution time: 45ms
- Connection wait time (pool exhausted): 400-1,200ms
- Time the request “wastes” waiting for a database connection it could use: 90% of total latency
The Prometheus query that reveals this:
# Connection acquisition time at p99
histogram_quantile(0.99,
sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le)
)
# Result during load: 1.2 seconds
# Active connections (should not be pinned at max)
hikaricp_connections_active{pool="HikariPool-1"}
# Result during load: 10 (constantly at max, all connections busy)
# Pending threads waiting for a connection
hikaricp_connections_pending{pool="HikariPool-1"}
# Result during load: 45 (45 requests queued, waiting)
Redis Cache Hit vs Cache Miss Path
The fare estimate endpoint has two latency profiles:
Cache hit path:
Redis GET → 2ms
JSON serialization → 1ms
Network return → 3ms
Total → 6ms
Cache miss path:
Redis GET (miss) → 2ms
PostgreSQL query → 35ms
Surge HTTP call → 50ms (parallel with PG if reactive)
Compute fare → 5ms
Redis SET → 2ms
JSON serialization → 1ms
Network return → 3ms
Total (reactive) → 58ms
Total (blocking) → 98ms
Under sustained load with a 60-second cache TTL and 8 pricing zones, approximately 0.13 cache misses per second per zone occur. With 8 zones, that is roughly 1 cache miss per second. At 2,000 RPS, 99.95% of requests take the 6ms path. The p99 is still dominated by cache misses because the 0.05% of requests that miss the cache bunch up when multiple zone caches expire simultaneously.
Response Serialization
The trip history endpoint returns a list of past trips. Each trip includes pickup/dropoff coordinates, fare breakdown, driver info, and timestamps. A rider with 200 past trips receives a JSON response of approximately 180KB.
// BOTTLENECK: Returning entire trip history as one JSON payload
@GetMapping("/api/trips/history")
public Flux<Trip> getTripHistory(
@RequestHeader("X-User-Id") String userId) {
return tripRepository.findByRiderId(userId)
.sort(Comparator.comparing(Trip::completedAt).reversed());
}
// Generated SQL:
// SELECT * FROM trips WHERE rider_id = ? ORDER BY completed_at DESC
// Returns all 200 trips with all columns
Serialization of 200 Trip objects with Jackson takes 12-25ms depending on GC pressure. The response body of 180KB takes 15-45ms to transmit over a typical mobile connection. At 3G speeds (1 Mbps), 180KB takes 1.4 seconds.
// SCALED: Paginated response with sparse fields
@GetMapping("/api/trips/history")
public Mono<TripHistoryResponse> getTripHistory(
@RequestHeader("X-User-Id") String userId,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size) {
return tripRepository.findByRiderId(userId, PageRequest.of(page, size))
.map(trip -> new TripSummary(
trip.id(),
trip.pickupAddress(),
trip.dropoffAddress(),
trip.fareTotal(),
trip.completedAt()
))
.collectList()
.map(trips -> new TripHistoryResponse(trips, page, size));
}
// Generated SQL:
// SELECT id, pickup_address, dropoff_address, fare_total, completed_at
// FROM trips WHERE rider_id = ? ORDER BY completed_at DESC
// LIMIT 20 OFFSET 0
// Returns 20 trips with 5 columns instead of 200 trips with 15 columns
The paginated response is approximately 4KB. Serialization takes < 1ms. Transfer takes < 10ms on 3G.
The Baseline
Locust test isolating the trip history endpoint with pagination disabled vs enabled:
# load-tests/trip_history_locustfile.py
from locust import HttpUser, task, between
class TripHistoryUser(HttpUser):
wait_time = between(0.5, 1.5)
@task
def view_full_history(self):
"""Unpaginated: returns all trips."""
self.client.get(
"/api/trips/history",
headers={"X-User-Id": "rider-heavy-user"},
name="/api/trips/history [full]"
)
@task
def view_paginated_history(self):
"""Paginated: returns 20 trips."""
self.client.get(
"/api/trips/history",
params={"page": 0, "size": 20},
headers={"X-User-Id": "rider-heavy-user"},
name="/api/trips/history [page]"
)
Results at 200 concurrent users:
Name p50 p95 p99 RPS Fail%
/api/trips/history [full] 450 2800 8400 2.05 1.1%
/api/trips/history [page] 35 120 280 18.40 0.0%
The Fix
Three changes, applied together:
- Paginate the response (shown above): reduces payload from 180KB to 4KB
- Select only needed columns: reduces query I/O from 15 columns to 5
- Enable response compression: reduces 4KB to ~1.2KB over the wire
# application.yml
server:
compression:
enabled: true
mime-types: application/json
min-response-size: 1024
The Proof
After all three changes:
Name p50 p95 p99 RPS Fail%
/api/trips/history [before] 450 2800 8400 2.05 1.1%
/api/trips/history [after] 28 85 180 24.60 0.0%
Delta:
p50: 450ms → 28ms (16x improvement)
p99: 8400ms → 180ms (46x improvement)
RPS: 2.05 → 24.60 (12x throughput)
Fail: 1.1% → 0.0%
The trip history endpoint went from the worst-performing endpoint in the baseline to one of the fastest. The fix was not a database optimization or a caching layer. It was returning less data. The cheapest performance optimization is not doing work that nobody asked for.
Bottleneck Migration Under Load
To demonstrate how bottlenecks migrate, run the step-load Locust test from Chapter 2’s main section and observe which Prometheus metric degrades at each step:
| Users | Bottleneck | Evidence |
|---|---|---|
| 50 | PostgreSQL query | pg_stat_statements.mean_exec_time is 50% of total latency |
| 100 | Mix of query + pool wait | hikaricp_connections_pending starts showing non-zero values |
| 200 | Connection pool | hikaricp_connections_acquire p99 exceeds 500ms |
| 300 | Connection pool + event loop | reactor_scheduler_tasks_pending starts climbing |
| 500 | Everything queuing | Load balancer queue depth > 100, all downstream metrics degraded |
The migration pattern is predictable: the component with the smallest capacity saturates first. For the ride-hailing platform, that is the PostgreSQL connection pool (10 connections). Chapter 4 fixes this. But fixing it just moves the bottleneck to the next constrained resource. Scaling is not about removing bottlenecks. It is about ensuring no single bottleneck dominates and that each layer degrades gracefully under load.