Load Balancer and Reverse Proxy Performance: Where Latency Hides Before It Reaches Your Code
Load Balancer and Reverse Proxy Performance: Where Latency Hides Before It Reaches Your Code
The content platform sits behind Nginx. Every request traverses it before reaching the Java application. In the best case, Nginx adds 0.1ms of latency. In the worst case (full request buffering, TLS termination, header inspection, access logging), it adds 5-15ms. During deployments, connection draining and health check transitions create P99 spikes that reach 500ms+.
This chapter measures where latency hides in the proxy layer, compares the overhead of different proxy architectures, and demonstrates configuration that minimizes added latency while preserving the operational benefits of having a proxy.
The Request Path Through the Proxy
A request to the content platform traverses these stages:
Client → DNS → Load Balancer (TLS termination)
→ Backend selection (routing decision)
→ Connection to backend (keep-alive pool)
→ Request forwarding (buffered or streamed)
→ Response from backend
→ Response forwarding to client (buffered or streamed)
Latency added at each stage:
TLS termination: 0.1-0.5ms (session resumption vs full handshake)
Routing decision: 0.01-0.1ms (regex path match vs exact match)
Backend connection: 0ms (keep-alive) or 1-2ms (new connection)
Request buffering: 0-5ms (depends on body size and buffer config)
Response buffering: 0-10ms (depends on response size and buffer config)
Access logging: 0.01-0.05ms (async logging)
Header injection: 0.005ms (X-Forwarded-For, X-Request-ID)
──────────────────────────────────────────────────────────────
Total (best case): ~0.2ms
Total (worst case): ~18ms
Proxy Comparison for the Content Platform
The content platform evaluated three proxies for the reverse proxy layer:
┌────────────────────┬──────────┬──────────┬──────────┐
│ Metric │ Nginx │ Envoy │ HAProxy │
├────────────────────┼──────────┼──────────┼──────────┤
│ P50 added latency │ 0.12ms │ 0.18ms │ 0.08ms │
│ P99 added latency │ 0.45ms │ 0.82ms │ 0.31ms │
│ Max RPS (1 core) │ 52,000 │ 38,000 │ 61,000 │
│ Memory (10K conns) │ 12MB │ 45MB │ 8MB │
│ Config reload │ 0.5s │ 0ms(hot) │ 0.1s │
│ HTTP/2 upstream │ Partial │ Full │ No │
│ gRPC support │ Limited │ Native │ No │
│ Observability │ Basic │ Rich │ Moderate│
│ L4 mode │ stream{} │ listener │ mode tcp│
└────────────────────┴──────────┴──────────┴──────────┘
Content platform choice: Nginx
- Lowest memory footprint (running on cost-sensitive infrastructure)
- Mature TLS termination with session ticket support
- P99 latency acceptable for content serving workload
- Static configuration fits deployment model (no dynamic discovery)
Measuring Proxy Overhead
# Benchmark: Direct to application vs through Nginx
# Tool: wrk2 (constant throughput mode, latency-focused)
# Direct to application (bypass proxy):
wrk2 -t4 -c100 -d60s -R10000 --latency http://localhost:8080/api/articles/12345
# Result:
# P50: 11.2ms
# P99: 28.4ms
# P99.9: 42.1ms
# Through Nginx (same machine, Unix socket to backend):
wrk2 -t4 -c100 -d60s -R10000 --latency http://localhost:80/api/articles/12345
# Result:
# P50: 11.4ms (+0.2ms)
# P99: 29.1ms (+0.7ms)
# P99.9: 45.8ms (+3.7ms)
# Through Nginx (same machine, TCP to backend):
wrk2 -t4 -c100 -d60s -R10000 --latency http://localhost:80/api/articles/12345
# Result:
# P50: 11.5ms (+0.3ms)
# P99: 30.2ms (+1.8ms)
# P99.9: 51.3ms (+9.2ms)
The Unix socket backend reduces proxy overhead by eliminating the TCP stack for proxy-to-backend communication. For same-host deployments, this is a free optimization:
# FAST: Unix socket upstream (eliminates TCP overhead to backend)
upstream article_backend {
server unix:/var/run/article-service.sock;
keepalive 64;
}
# SLOW: TCP upstream on same host (unnecessary TCP stack traversal)
upstream article_backend_tcp {
server 127.0.0.1:8080;
keepalive 64;
}
TLS Termination Cost
TLS termination is the proxy’s most CPU-intensive operation. The cost depends on cipher suite, key size, and session resumption:
# Content platform Nginx TLS configuration (optimized for latency):
ssl_protocols TLSv1.3;
ssl_prefer_server_ciphers off; # Let client choose (modern clients pick fast ciphers)
# Session resumption eliminates handshake CPU cost for returning clients:
ssl_session_cache shared:SSL:50m; # Cache 200K sessions (~250 bytes each)
ssl_session_timeout 1h; # Sessions valid for 1 hour
ssl_session_tickets on; # Enable stateless resumption
ssl_session_ticket_key /etc/nginx/ticket.key; # Rotate daily via cron
# OCSP stapling: avoids client-side OCSP lookup (saves 50-200ms for client)
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 valid=300s;
# Early data (0-RTT TLS 1.3): reduces latency by 1 RTT for returning clients
ssl_early_data on;
# WARNING: 0-RTT data is replayable. Only safe for idempotent requests.
# Upstream must check Early-Data header and reject non-idempotent requests:
proxy_set_header Early-Data $ssl_early_data;
// Backend: Reject non-idempotent requests sent as TLS early data
@Component
public class EarlyDataFilter implements WebFilter {
@Override
public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
String earlyData = exchange.getRequest().getHeaders()
.getFirst("Early-Data");
if ("1".equals(earlyData)) {
String method = exchange.getRequest().getMethod().name();
if (!"GET".equals(method) && !"HEAD".equals(method)) {
// Reject: replay attack risk for non-idempotent methods
exchange.getResponse().setStatusCode(HttpStatus.TOO_EARLY); // 425
return exchange.getResponse().setComplete();
}
}
return chain.filter(exchange);
}
}
TLS Benchmark
TLS handshake cost per connection (Nginx, single core):
Full handshake (RSA 2048): 0.8ms / 1,250 handshakes/s/core
Full handshake (ECDHE P-256): 0.3ms / 3,300 handshakes/s/core
Resumed (session ticket): 0.05ms / 20,000 resumes/s/core
0-RTT (TLS 1.3 early data): 0ms (no server-side crypto for early data)
Content platform traffic:
New connections: ~500/s (unique visitors)
Resumed connections: ~4,500/s (returning within 1h)
Total TLS operations: 5,000/s
CPU for TLS (with session resumption):
500 * 0.3ms + 4,500 * 0.05ms = 150ms + 225ms = 375ms of CPU per second
= 37.5% of one core dedicated to TLS
Without resumption: 5,000 * 0.3ms = 1,500ms = 150% of one core (needs 2 cores)
Request and Response Buffering
Nginx buffers requests and responses by default. This protects the backend from slow clients but adds latency:
# SLOW: Full request buffering (default)
# Nginx reads entire request body before forwarding to backend.
# For a 1MB image upload at 1Mbps client speed: 8 seconds of buffering
# Backend does not see the request until buffering completes.
proxy_request_buffering on; # Default: on
proxy_buffering on; # Default: on
# FAST: Streaming mode (forward bytes as they arrive)
# Backend receives first bytes immediately.
# Trade-off: backend connection is held longer (occupied during slow upload)
location /api/articles {
proxy_request_buffering off; # Stream request to backend immediately
proxy_buffering off; # Stream response to client immediately
proxy_pass http://article_backend;
}
# BALANCED: Buffer responses but stream requests
# Best for the content platform:
# - Requests are small (JSON, < 10KB) → buffering adds < 0.1ms
# - Responses vary (article JSON: 5-50KB) → buffering protects backend
location /api/ {
proxy_request_buffering on;
proxy_buffering on;
proxy_buffer_size 4k; # First part of response (headers + start)
proxy_buffers 8 8k; # Buffer up to 64KB in memory
proxy_busy_buffers_size 16k; # Send to client when this much is buffered
proxy_pass http://article_backend;
}
# For large responses (full article HTML, 100KB+):
location /api/articles/full {
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 16 16k; # Buffer up to 256KB
proxy_max_temp_file_size 0; # Never buffer to disk (latency death)
proxy_pass http://article_backend;
}
Buffering Latency Measurement
Request buffering impact (content platform article create endpoint):
Request body: 5KB JSON
With buffering ON (default):
Nginx receives full body: 0.05ms (fast client, same datacenter)
Nginx forwards to backend: 0.01ms
Total added: 0.06ms (negligible for small payloads)
With buffering OFF (streaming):
Nginx forwards immediately: 0.01ms
Total added: 0.01ms
Difference: 0.05ms (not worth optimizing for small payloads)
Response buffering impact (article list, 37KB response):
With buffering ON:
Backend sends response: 0.8ms (backend to Nginx buffer)
Nginx sends to client: started after proxy_busy_buffers_size reached
Backend connection freed: 0.8ms after response starts
Client receives: depends on client bandwidth
Total backend connection time: 0.8ms
With buffering OFF:
Backend sends response: 0.8ms
Client receives: depends on client bandwidth (could be 200ms on 3G)
Backend connection freed: when client finishes receiving (up to 200ms!)
Total backend connection time: up to 200ms
Buffering protects backend connections from slow clients.
For the content platform: keep response buffering ON.
Keepalive to Backend
Nginx can maintain persistent connections to backend servers, avoiding per-request TCP handshakes:
upstream article_backend {
server 10.0.2.10:8080;
server 10.0.2.11:8080;
server 10.0.2.12:8080;
# FAST: Persistent connections to backend
keepalive 64; # Keep 64 idle connections per worker
keepalive_requests 10000; # Max requests per connection before close
keepalive_time 10m; # Max connection lifetime (for rebalancing)
keepalive_timeout 60s; # Close idle connections after 60s
}
server {
location /api/ {
proxy_pass http://article_backend;
# Required for keepalive to work:
proxy_http_version 1.1; # HTTP/1.0 does not support keepalive
proxy_set_header Connection ""; # Remove "Connection: close" header
}
}
Backend keepalive impact (measured with wrk2 at 10,000 RPS):
Without keepalive (new connection per request):
TCP handshakes/s: 10,000
P50 added latency: 0.5ms (TCP handshake)
P99 added latency: 2.1ms (occasional SYN retransmit)
Backend CPU for accept(): +3%
With keepalive 64:
TCP handshakes/s: ~1/min (only when connections rotate)
P50 added latency: 0.02ms (reuse existing connection)
P99 added latency: 0.08ms
Backend CPU saved: 3%
Health Check Configuration
Health checks verify backend availability. Too frequent: wasted resources and noise. Too infrequent: slow failure detection sends traffic to dead backends.
# Nginx Plus / OpenResty with active health checks:
upstream article_backend {
zone backend_zone 64k;
server 10.0.2.10:8080;
server 10.0.2.11:8080;
server 10.0.2.12:8080;
# Active health checks (Nginx Plus only):
# Check every 3 seconds, mark down after 2 failures, up after 2 successes
health_check interval=3s fails=2 passes=2 uri=/health/live;
}
# For Nginx OSS: use passive health checks (detect failures from real traffic)
upstream article_backend_oss {
server 10.0.2.10:8080 max_fails=3 fail_timeout=10s;
server 10.0.2.11:8080 max_fails=3 fail_timeout=10s;
server 10.0.2.12:8080 max_fails=3 fail_timeout=10s;
}
Health check overhead analysis:
Active health checks (3 backends, 3s interval):
Requests/s for health checks: 3 backends / 3s = 1 req/s
Bandwidth: 1 * ~200 bytes = 200 bytes/s (negligible)
Failure detection time: 2 failures * 3s = 6s (worst case)
Passive health checks (from real traffic):
Additional requests: 0 (uses existing traffic)
Failure detection time: depends on traffic rate
At 10,000 RPS per backend: 3 failures in < 1ms
At 10 RPS per backend: 3 failures in ~300ms
Problem: low-traffic backends may not be detected as failed
Content platform choice: passive checks for OSS Nginx
Traffic rate: 3,300 RPS per backend (sufficient for fast detection)
max_fails=3, fail_timeout=10s
Detection time: < 1ms (3 failures happen within 1ms at this rate)
The Full Content Platform Nginx Configuration
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
multi_accept on;
use epoll;
}
http {
# Logging: async to avoid blocking request processing
access_log /var/log/nginx/access.log combined buffer=64k flush=5s;
error_log /var/log/nginx/error.log warn;
# TCP optimizations
sendfile on;
tcp_nopush on; # Coalesce headers + first body chunk
tcp_nodelay on; # Disable Nagle's after tcp_nopush sends
# Keepalive to clients
keepalive_timeout 65;
keepalive_requests 1000;
# Gzip (for HTTP/1.1 clients without Brotli)
gzip on;
gzip_comp_level 4;
gzip_types application/json text/plain application/javascript text/css;
gzip_min_length 256;
upstream article_backend {
server 10.0.2.10:8080 max_fails=3 fail_timeout=10s;
server 10.0.2.11:8080 max_fails=3 fail_timeout=10s;
server 10.0.2.12:8080 max_fails=3 fail_timeout=10s;
keepalive 64;
keepalive_requests 10000;
keepalive_time 10m;
keepalive_timeout 60s;
}
server {
listen 443 ssl http2 reuseport;
server_name content-platform.example.com;
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
ssl_protocols TLSv1.3;
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1h;
ssl_session_tickets on;
ssl_early_data on;
location /api/ {
proxy_pass http://article_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Request-ID $request_id;
proxy_set_header Early-Data $ssl_early_data;
proxy_connect_timeout 2s;
proxy_send_timeout 5s;
proxy_read_timeout 30s;
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 8k;
}
# Static assets: served directly by Nginx (no proxy overhead)
location /assets/ {
root /var/www/content-platform;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
}
The next two sections cover the trade-offs in detail: Section 1 benchmarks Layer 4 vs Layer 7 routing costs, Section 2 addresses deployment-time latency from draining, health checks, and JVM warm-up.