Protocol Overhead: HTTP/1.1 vs HTTP/2 vs HTTP/3 and gRPC in Numbers
Protocol Overhead: HTTP/1.1 vs HTTP/2 vs HTTP/3 and gRPC in Numbers
The content platform serves an article list API that returns 50 article summaries per request. The frontend makes this call on every page load, along with 12 additional requests for recommendations, user state, analytics scripts, fonts, and images. Under HTTP/1.1, the browser opens 6 connections per origin. Those 12 additional requests queue behind the first 6, adding 200ms of waiting time that has nothing to do with server processing.
Switching to HTTP/2 eliminates the queue. Switching to gRPC for internal service-to-service calls cuts serialization overhead by 60%. This chapter measures each protocol’s overhead on the same workload and shows where the time goes.
The Cost of a Connection
Before any application data flows, the client and server negotiate a connection. The time spent in this negotiation varies dramatically across protocols:
HTTP/1.1 + TLS 1.2:
TCP handshake: 1 RTT (SYN, SYN-ACK, ACK)
TLS handshake: 2 RTT (ClientHello, ServerHello+Cert, Finished)
Total before data: 3 RTT
HTTP/1.1 + TLS 1.3:
TCP handshake: 1 RTT
TLS handshake: 1 RTT (0.5 RTT savings from combined messages)
Total before data: 2 RTT
HTTP/2 + TLS 1.3:
TCP handshake: 1 RTT
TLS handshake: 1 RTT (ALPN negotiates h2 during TLS)
Total before data: 2 RTT (same as HTTP/1.1+TLS1.3, but one connection serves all)
HTTP/3 + QUIC:
QUIC handshake: 1 RTT (crypto + transport in one flight)
Total before data: 1 RTT
Resumption: 0 RTT (0-RTT with cached session ticket)
For users on mobile networks with 80ms RTT, this translates to:
HTTP/1.1 + TLS 1.2: 240ms before first byte (per connection)
HTTP/1.1 + TLS 1.3: 160ms before first byte (per connection)
HTTP/2 + TLS 1.3: 160ms before first byte (one connection, reused)
HTTP/3 + QUIC: 80ms before first byte
HTTP/3 + 0-RTT: 0ms additional (data sent with handshake)
The content platform serves users globally. In Southeast Asia, typical RTT to our US-East servers is 180ms. The protocol choice adds between 180ms and 540ms of pure connection overhead before any application data moves.
Head-of-Line Blocking in HTTP/1.1
HTTP/1.1 is a strictly serial protocol on each connection. The client sends a request, waits for the complete response, then sends the next request. HTTP pipelining was standardized but never reliably deployed due to intermediary incompatibilities. In practice, browsers enforce 6 connections per origin (Chrome, Firefox) or 8 (older IE):
// SLOW: HTTP/1.1 with 6 concurrent connections, 13 resources needed
// Timeline for 13 requests on 6 connections (50ms server processing each):
//
// Conn 1: [---req1---][---req7---][---req13---]
// Conn 2: [---req2---][---req8---]
// Conn 3: [---req3---][---req9---]
// Conn 4: [---req4---][---req10--]
// Conn 5: [---req5---][---req11--]
// Conn 6: [---req6---][---req12--]
//
// Total time: 3 rounds * 50ms = 150ms (plus connection setup per conn)
// Actual time with 80ms RTT: 150ms + (6 * 160ms TLS setup) = 1110ms
// FAST: HTTP/2 single connection, all 13 requests concurrent
// Timeline for 13 requests on 1 multiplexed connection:
//
// Stream 1: [---req1---]
// Stream 3: [---req2---]
// Stream 5: [---req3---]
// ...
// Stream 25: [---req13--]
//
// Total time: 50ms (all concurrent) + 160ms (one TLS setup) = 210ms
The gap widens with more resources. A typical content platform page loads:
| Resource Type | Count | Avg Size |
|---|---|---|
| Article list API | 1 | 45KB |
| Recommendation API | 1 | 12KB |
| User state API | 1 | 2KB |
| Analytics scripts | 3 | 35KB each |
| Font files | 4 | 25KB each |
| Hero images | 3 | 80KB each |
| CSS bundles | 2 | 18KB each |
| Total | 15 | 511KB |
Under HTTP/1.1, the 15 resources require 3 rounds of 6. Under HTTP/2, all 15 fly concurrently on one connection.
HTTP/2 Multiplexing
HTTP/2 replaces the text-based request/response model with binary frames multiplexed over a single TCP connection. Each request/response pair operates on a numbered stream. Frames from different streams interleave on the wire:
// HTTP/2 frame structure
// +-----------------------------------------------+
// | Length (24 bits) |
// +---------------+-------------------------------+
// | Type (8 bits)| Flags (8 bits) |
// +-+-------------+-------------------------------+
// |R| Stream Identifier (31 bits) |
// +-+---------------------------------------------+
// | Frame Payload (0-16384 bytes) |
// +-----------------------------------------------+
// Frame types relevant to performance:
// DATA (0x0) - response body chunks
// HEADERS (0x1) - compressed headers (HPACK)
// PRIORITY (0x2) - stream priority (deprecated in favor of RFC 9218)
// RST_STREAM (0x3)- cancel a single stream without killing connection
// SETTINGS (0x4) - negotiation (max concurrent streams, window size)
// WINDOW_UPDATE (0x7) - flow control per-stream and connection-level
The multiplexing eliminates application-layer HOL blocking but introduces a TCP-layer variant. If a single TCP packet is lost, all streams stall until retransmission completes. This is the motivation for HTTP/3.
HTTP/3 and QUIC
QUIC moves transport and TLS into a single UDP-based protocol. Each stream has independent loss recovery. A lost packet on stream 5 does not block data delivery on stream 7:
// HTTP/3 over QUIC: independent stream loss recovery
//
// TCP (HTTP/2): Packet loss on any stream blocks ALL streams
// [Stream1-data][Stream2-data][LOST][Stream3-data][Stream4-data]
// ^^^^ blocked waiting for retransmit
//
// QUIC (HTTP/3): Packet loss on one stream blocks only THAT stream
// Stream 1: [data][data][data] -> delivered
// Stream 2: [data][LOST][data] -> stream 2 waits
// Stream 3: [data][data][data] -> delivered (independent)
// Stream 4: [data][data][data] -> delivered (independent)
QUIC also supports connection migration. When a mobile user switches from WiFi to cellular, the connection ID persists. No new handshake required:
// Connection migration scenario (content platform mobile user):
// 1. User reading article on WiFi (connection ID: 0x1a2b3c)
// 2. User walks outside, phone switches to cellular
// 3. IP address changes from 192.168.1.50 to 100.64.0.7
// 4. QUIC connection continues with same ID (0x1a2b3c)
// 5. No re-handshake, no state loss, no request retry
//
// Under HTTP/2 + TCP:
// 1. TCP connection bound to (src_ip, src_port, dst_ip, dst_port)
// 2. IP change kills the connection
// 3. New TCP + TLS handshake: 2 RTT
// 4. Application must detect failure, reconnect, retry in-flight requests
gRPC for Internal Services
The content platform’s backend consists of 5 services: article-service, search-service, recommendation-service, analytics-service, and user-service. These communicate over the internal network with 0.5ms RTT. At this latency, protocol overhead as a percentage of total request time is significant.
gRPC combines HTTP/2 transport with Protocol Buffers serialization:
// article_service.proto
syntax = "proto3";
package content.platform;
service ArticleService {
rpc GetArticleList(ArticleListRequest) returns (ArticleListResponse);
rpc GetArticleBatch(BatchRequest) returns (stream ArticleSummary);
rpc StreamViewEvents(stream ViewEvent) returns (ViewEventAck);
}
message ArticleListRequest {
int32 page_size = 1;
string cursor = 2;
repeated string categories = 3;
}
message ArticleSummary {
string id = 1;
string title = 2;
string excerpt = 3;
int64 view_count = 4;
int64 published_at_epoch = 5;
repeated string categories = 6;
string author = 7;
string thumbnail_url = 8;
}
message ArticleListResponse {
repeated ArticleSummary articles = 1;
string next_cursor = 2;
int32 total_count = 3;
}
The same payload as JSON (REST) vs Protobuf (gRPC):
ArticleListResponse with 50 articles:
JSON: 48,230 bytes (pretty) / 37,450 bytes (minified)
Protobuf: 14,820 bytes
Serialization time (50 articles, JMH, warm JVM):
Jackson JSON serialize: 142 us
Protobuf serialize: 38 us (3.7x faster)
Jackson JSON deserialize: 198 us
Protobuf deserialize: 52 us (3.8x faster)
Benchmark: Article List API Across Protocols
Test setup:
- Server: Spring Boot 3.3, Netty, 4 vCPU, 8GB RAM
- Client: Locust with custom protocol adapters, 10 workers
- Workload: GET /api/articles?page_size=50, 1000 concurrent users
- Network: Simulated 20ms RTT (internal DC), 80ms RTT (user-facing)
# locust_http2_benchmark.py
from locust import HttpUser, task, between
import resource
class ArticleListUser(HttpUser):
wait_time = between(0.1, 0.5)
@task
def get_articles(self):
self.client.get(
"/api/articles?page_size=50",
headers={"Accept": "application/json"}
)
Results at 1000 concurrent users, 80ms RTT:
| Metric | HTTP/1.1+TLS1.2 | HTTP/2+TLS1.3 | HTTP/3+QUIC | gRPC |
|---|---|---|---|---|
| Connection setup | 240ms | 160ms (once) | 80ms (once) | 160ms (once) |
| P50 latency | 312ms | 94ms | 78ms | 62ms |
| P99 latency | 890ms | 210ms | 185ms | 148ms |
| Throughput (req/s) | 2,840 | 9,200 | 9,800 | 11,400 |
| Connections used | 6,000 | 1,000 | 1,000 | 1,000 |
| Bandwidth (MB/s) | 106 | 98 | 95 | 38 |
| Server memory | 2.4GB | 890MB | 920MB | 680MB |
Key observations:
- HTTP/1.1 P50 is 3.3x worse than HTTP/2 because of connection contention and HOL blocking
- HTTP/3 improves P99 by 12% over HTTP/2 due to independent stream loss recovery
- gRPC beats HTTP/3 on latency (Protobuf vs JSON) and bandwidth (2.5x smaller payload)
- HTTP/1.1 uses 6x more connections, consuming 2.7x more server memory for connection state
Connection Setup Under Packet Loss
Protocol differences amplify under lossy conditions. Mobile networks commonly experience 1-3% packet loss:
1% packet loss, 80ms RTT, connection establishment:
HTTP/1.1 + TLS 1.2:
No loss: 240ms
1 lost packet in handshake: 240ms + RTO(200ms) = 440ms
Per-connection cost, 6 connections needed
HTTP/2 + TLS 1.3:
No loss: 160ms
1 lost packet in handshake: 160ms + RTO(200ms) = 360ms
Single connection, amortized across all requests
HTTP/3 + QUIC:
No loss: 80ms
1 lost packet in handshake: 80ms + QUIC_RTO(~100ms) = 180ms
QUIC faster retransmission than TCP (no head-of-line blocking on ACKs)
0-RTT resumption: 0ms + data immediately
TCP RTO minimum: 200ms (Linux default)
QUIC loss detection: ~100ms (packet threshold + time threshold)
Under 2% packet loss, HTTP/2 suffers from TCP HOL blocking. A single lost TCP segment blocks all HTTP/2 streams until retransmission. HTTP/3 isolates loss to individual QUIC streams:
2% packet loss, streaming 50 article summaries:
HTTP/2: One lost TCP segment blocks all 50 articles
Completion time P99: 480ms (includes TCP retransmit stall)
HTTP/3: Lost QUIC packet blocks only affected stream(s)
Completion time P99: 210ms (unaffected streams deliver immediately)
Server Configuration for HTTP/2
Spring Boot with Netty supports HTTP/2 out of the box with TLS:
// application.yml
// server:
// http2:
// enabled: true
// ssl:
// enabled: true
// protocol: TLS
// enabled-protocols: TLSv1.3
// netty:
// max-concurrent-streams: 250
// initial-window-size: 1048576
// max-header-list-size: 8192
@Configuration
public class Http2Config {
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> http2Customizer() {
return factory -> factory.addServerCustomizers(httpServer ->
httpServer.httpRequestDecoder(spec -> spec
.maxHeaderSize(8192)
.maxInitialLineLength(4096)
)
);
}
}
Critical tuning parameters:
| Parameter | Default | Recommended | Rationale |
|---|---|---|---|
| max-concurrent-streams | 100 | 250 | Content pages load 15+ resources; allow headroom |
| initial-window-size | 65535 | 1048576 | 64KB window forces frequent WINDOW_UPDATE frames |
| max-header-list-size | 8192 | 16384 | Large cookies or auth tokens can exceed 8KB |
Trade-offs
HTTP/2 is not universally better. For single large downloads (video streaming, file transfer), HTTP/1.1’s simplicity means less framing overhead. For the content platform’s mixed workload of many small API calls plus some large media, HTTP/2 wins decisively.
HTTP/3 adoption requires QUIC support in load balancers and CDNs. As of 2024, Cloudflare, Google Cloud, and AWS CloudFront support HTTP/3. Nginx added experimental QUIC in 1.25.0. For the content platform fronted by Cloudflare, HTTP/3 is automatic for browser traffic.
gRPC is optimal for service-to-service communication where both sides control the stack. It adds complexity: .proto file management, code generation build steps, harder debugging (binary protocol). For the content platform’s 5 internal services with 50+ RPC endpoints, the 3.7x serialization speedup and 2.5x bandwidth reduction justify the investment.
| Protocol | Best For | Avoid When |
|---|---|---|
| HTTP/1.1 | Legacy clients, simple proxies | High-concurrency, mobile users |
| HTTP/2 | Browser-facing APIs, mixed resource pages | Single large transfers, UDP-blocked networks |
| HTTP/3 | Mobile users, lossy networks, global users | Internal DC traffic (0.5ms RTT, no loss) |
| gRPC | Internal services, streaming, high-throughput | Public APIs, browser clients, debugging ease |
The content platform uses all four: HTTP/3 at the CDN edge for browsers, HTTP/2 between CDN and origin, gRPC between internal services, and HTTP/1.1 only for health check endpoints consumed by legacy monitoring.