Protocol Overhead: HTTP/1.1 vs HTTP/2 vs HTTP/3 and gRPC in Numbers

The content platform serves an article list API that returns 50 article summaries per request. The frontend makes this call on every page load, along with 12 additional requests for recommendations, user state, analytics scripts, fonts, and images. Under HTTP/1.1, the browser opens 6 connections per origin. Those 12 additional requests queue behind the first 6, adding 200ms of waiting time that has nothing to do with server processing.

Switching to HTTP/2 eliminates the queue. Switching to gRPC for internal service-to-service calls cuts serialization overhead by 60%. This chapter measures each protocol’s overhead on the same workload and shows where the time goes.

The Cost of a Connection

Protocol Overhead Comparison

Before any application data flows, the client and server negotiate a connection. The time spent in this negotiation varies dramatically across protocols:

HTTP/1.1 + TLS 1.2:
  TCP handshake:      1 RTT (SYN, SYN-ACK, ACK)
  TLS handshake:      2 RTT (ClientHello, ServerHello+Cert, Finished)
  Total before data:  3 RTT

HTTP/1.1 + TLS 1.3:
  TCP handshake:      1 RTT
  TLS handshake:      1 RTT (0.5 RTT savings from combined messages)
  Total before data:  2 RTT

HTTP/2 + TLS 1.3:
  TCP handshake:      1 RTT
  TLS handshake:      1 RTT (ALPN negotiates h2 during TLS)
  Total before data:  2 RTT (same as HTTP/1.1+TLS1.3, but one connection serves all)

HTTP/3 + QUIC:
  QUIC handshake:     1 RTT (crypto + transport in one flight)
  Total before data:  1 RTT
  Resumption:         0 RTT (0-RTT with cached session ticket)

For users on mobile networks with 80ms RTT, this translates to:

HTTP/1.1 + TLS 1.2:  240ms before first byte (per connection)
HTTP/1.1 + TLS 1.3:  160ms before first byte (per connection)
HTTP/2 + TLS 1.3:    160ms before first byte (one connection, reused)
HTTP/3 + QUIC:       80ms before first byte
HTTP/3 + 0-RTT:      0ms additional (data sent with handshake)

The content platform serves users globally. In Southeast Asia, typical RTT to our US-East servers is 180ms. The protocol choice adds between 180ms and 540ms of pure connection overhead before any application data moves.

Head-of-Line Blocking in HTTP/1.1

HTTP/1.1 is a strictly serial protocol on each connection. The client sends a request, waits for the complete response, then sends the next request. HTTP pipelining was standardized but never reliably deployed due to intermediary incompatibilities. In practice, browsers enforce 6 connections per origin (Chrome, Firefox) or 8 (older IE):

// SLOW: HTTP/1.1 with 6 concurrent connections, 13 resources needed
// Timeline for 13 requests on 6 connections (50ms server processing each):
//
// Conn 1: [---req1---][---req7---][---req13---]
// Conn 2: [---req2---][---req8---]
// Conn 3: [---req3---][---req9---]
// Conn 4: [---req4---][---req10--]
// Conn 5: [---req5---][---req11--]
// Conn 6: [---req6---][---req12--]
//
// Total time: 3 rounds * 50ms = 150ms (plus connection setup per conn)
// Actual time with 80ms RTT: 150ms + (6 * 160ms TLS setup) = 1110ms

// FAST: HTTP/2 single connection, all 13 requests concurrent
// Timeline for 13 requests on 1 multiplexed connection:
//
// Stream 1:  [---req1---]
// Stream 3:  [---req2---]
// Stream 5:  [---req3---]
// ...
// Stream 25: [---req13--]
//
// Total time: 50ms (all concurrent) + 160ms (one TLS setup) = 210ms

The gap widens with more resources. A typical content platform page loads:

Resource Type	Count	Avg Size
Article list API	1	45KB
Recommendation API	1	12KB
User state API	1	2KB
Analytics scripts	3	35KB each
Font files	4	25KB each
Hero images	3	80KB each
CSS bundles	2	18KB each
Total	15	511KB

Under HTTP/1.1, the 15 resources require 3 rounds of 6. Under HTTP/2, all 15 fly concurrently on one connection.

HTTP/2 Multiplexing

HTTP/2 replaces the text-based request/response model with binary frames multiplexed over a single TCP connection. Each request/response pair operates on a numbered stream. Frames from different streams interleave on the wire:

// HTTP/2 frame structure
// +-----------------------------------------------+
// |                 Length (24 bits)               |
// +---------------+-------------------------------+
// |  Type (8 bits)|  Flags (8 bits)               |
// +-+-------------+-------------------------------+
// |R|         Stream Identifier (31 bits)         |
// +-+---------------------------------------------+
// |              Frame Payload (0-16384 bytes)     |
// +-----------------------------------------------+

// Frame types relevant to performance:
// DATA (0x0)      - response body chunks
// HEADERS (0x1)   - compressed headers (HPACK)
// PRIORITY (0x2)  - stream priority (deprecated in favor of RFC 9218)
// RST_STREAM (0x3)- cancel a single stream without killing connection
// SETTINGS (0x4)  - negotiation (max concurrent streams, window size)
// WINDOW_UPDATE (0x7) - flow control per-stream and connection-level

The multiplexing eliminates application-layer HOL blocking but introduces a TCP-layer variant. If a single TCP packet is lost, all streams stall until retransmission completes. This is the motivation for HTTP/3.

HTTP/3 and QUIC

QUIC moves transport and TLS into a single UDP-based protocol. Each stream has independent loss recovery. A lost packet on stream 5 does not block data delivery on stream 7:

// HTTP/3 over QUIC: independent stream loss recovery
//
// TCP (HTTP/2): Packet loss on any stream blocks ALL streams
// [Stream1-data][Stream2-data][LOST][Stream3-data][Stream4-data]
//                                    ^^^^ blocked waiting for retransmit
//
// QUIC (HTTP/3): Packet loss on one stream blocks only THAT stream
// Stream 1: [data][data][data]          -> delivered
// Stream 2: [data][LOST][data]          -> stream 2 waits
// Stream 3: [data][data][data]          -> delivered (independent)
// Stream 4: [data][data][data]          -> delivered (independent)

QUIC also supports connection migration. When a mobile user switches from WiFi to cellular, the connection ID persists. No new handshake required:

// Connection migration scenario (content platform mobile user):
// 1. User reading article on WiFi (connection ID: 0x1a2b3c)
// 2. User walks outside, phone switches to cellular
// 3. IP address changes from 192.168.1.50 to 100.64.0.7
// 4. QUIC connection continues with same ID (0x1a2b3c)
// 5. No re-handshake, no state loss, no request retry
//
// Under HTTP/2 + TCP:
// 1. TCP connection bound to (src_ip, src_port, dst_ip, dst_port)
// 2. IP change kills the connection
// 3. New TCP + TLS handshake: 2 RTT
// 4. Application must detect failure, reconnect, retry in-flight requests

gRPC for Internal Services

The content platform’s backend consists of 5 services: article-service, search-service, recommendation-service, analytics-service, and user-service. These communicate over the internal network with 0.5ms RTT. At this latency, protocol overhead as a percentage of total request time is significant.

gRPC combines HTTP/2 transport with Protocol Buffers serialization:

// article_service.proto
syntax = "proto3";

package content.platform;

service ArticleService {
  rpc GetArticleList(ArticleListRequest) returns (ArticleListResponse);
  rpc GetArticleBatch(BatchRequest) returns (stream ArticleSummary);
  rpc StreamViewEvents(stream ViewEvent) returns (ViewEventAck);
}

message ArticleListRequest {
  int32 page_size = 1;
  string cursor = 2;
  repeated string categories = 3;
}

message ArticleSummary {
  string id = 1;
  string title = 2;
  string excerpt = 3;
  int64 view_count = 4;
  int64 published_at_epoch = 5;
  repeated string categories = 6;
  string author = 7;
  string thumbnail_url = 8;
}

message ArticleListResponse {
  repeated ArticleSummary articles = 1;
  string next_cursor = 2;
  int32 total_count = 3;
}

The same payload as JSON (REST) vs Protobuf (gRPC):

ArticleListResponse with 50 articles:
  JSON:     48,230 bytes (pretty) / 37,450 bytes (minified)
  Protobuf: 14,820 bytes

Serialization time (50 articles, JMH, warm JVM):
  Jackson JSON serialize:   142 us
  Protobuf serialize:        38 us (3.7x faster)
  Jackson JSON deserialize: 198 us
  Protobuf deserialize:      52 us (3.8x faster)

Benchmark: Article List API Across Protocols

Test setup:

Server: Spring Boot 3.3, Netty, 4 vCPU, 8GB RAM
Client: Locust with custom protocol adapters, 10 workers
Workload: GET /api/articles?page_size=50, 1000 concurrent users
Network: Simulated 20ms RTT (internal DC), 80ms RTT (user-facing)

# locust_http2_benchmark.py
from locust import HttpUser, task, between
import resource

class ArticleListUser(HttpUser):
    wait_time = between(0.1, 0.5)

    @task
    def get_articles(self):
        self.client.get(
            "/api/articles?page_size=50",
            headers={"Accept": "application/json"}
        )

Results at 1000 concurrent users, 80ms RTT:

Metric	HTTP/1.1+TLS1.2	HTTP/2+TLS1.3	HTTP/3+QUIC	gRPC
Connection setup	240ms	160ms (once)	80ms (once)	160ms (once)
P50 latency	312ms	94ms	78ms	62ms
P99 latency	890ms	210ms	185ms	148ms
Throughput (req/s)	2,840	9,200	9,800	11,400
Connections used	6,000	1,000	1,000	1,000
Bandwidth (MB/s)	106	98	95	38
Server memory	2.4GB	890MB	920MB	680MB

Key observations:

HTTP/1.1 P50 is 3.3x worse than HTTP/2 because of connection contention and HOL blocking
HTTP/3 improves P99 by 12% over HTTP/2 due to independent stream loss recovery
gRPC beats HTTP/3 on latency (Protobuf vs JSON) and bandwidth (2.5x smaller payload)
HTTP/1.1 uses 6x more connections, consuming 2.7x more server memory for connection state

Connection Setup Under Packet Loss

Protocol differences amplify under lossy conditions. Mobile networks commonly experience 1-3% packet loss:

1% packet loss, 80ms RTT, connection establishment:

HTTP/1.1 + TLS 1.2:
  No loss: 240ms
  1 lost packet in handshake: 240ms + RTO(200ms) = 440ms
  Per-connection cost, 6 connections needed

HTTP/2 + TLS 1.3:
  No loss: 160ms
  1 lost packet in handshake: 160ms + RTO(200ms) = 360ms
  Single connection, amortized across all requests

HTTP/3 + QUIC:
  No loss: 80ms
  1 lost packet in handshake: 80ms + QUIC_RTO(~100ms) = 180ms
  QUIC faster retransmission than TCP (no head-of-line blocking on ACKs)
  0-RTT resumption: 0ms + data immediately

TCP RTO minimum: 200ms (Linux default)
QUIC loss detection: ~100ms (packet threshold + time threshold)

Under 2% packet loss, HTTP/2 suffers from TCP HOL blocking. A single lost TCP segment blocks all HTTP/2 streams until retransmission. HTTP/3 isolates loss to individual QUIC streams:

2% packet loss, streaming 50 article summaries:

HTTP/2: One lost TCP segment blocks all 50 articles
  Completion time P99: 480ms (includes TCP retransmit stall)

HTTP/3: Lost QUIC packet blocks only affected stream(s)
  Completion time P99: 210ms (unaffected streams deliver immediately)

Server Configuration for HTTP/2

Spring Boot with Netty supports HTTP/2 out of the box with TLS:

// application.yml
// server:
//   http2:
//     enabled: true
//   ssl:
//     enabled: true
//     protocol: TLS
//     enabled-protocols: TLSv1.3
//   netty:
//     max-concurrent-streams: 250
//     initial-window-size: 1048576
//     max-header-list-size: 8192

@Configuration
public class Http2Config {

    @Bean
    public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> http2Customizer() {
        return factory -> factory.addServerCustomizers(httpServer ->
            httpServer.httpRequestDecoder(spec -> spec
                .maxHeaderSize(8192)
                .maxInitialLineLength(4096)
            )
        );
    }
}

Critical tuning parameters:

Parameter	Default	Recommended	Rationale
max-concurrent-streams	100	250	Content pages load 15+ resources; allow headroom
initial-window-size	65535	1048576	64KB window forces frequent WINDOW_UPDATE frames
max-header-list-size	8192	16384	Large cookies or auth tokens can exceed 8KB

Trade-offs

HTTP/2 is not universally better. For single large downloads (video streaming, file transfer), HTTP/1.1’s simplicity means less framing overhead. For the content platform’s mixed workload of many small API calls plus some large media, HTTP/2 wins decisively.

HTTP/3 adoption requires QUIC support in load balancers and CDNs. As of 2024, Cloudflare, Google Cloud, and AWS CloudFront support HTTP/3. Nginx added experimental QUIC in 1.25.0. For the content platform fronted by Cloudflare, HTTP/3 is automatic for browser traffic.

gRPC is optimal for service-to-service communication where both sides control the stack. It adds complexity: .proto file management, code generation build steps, harder debugging (binary protocol). For the content platform’s 5 internal services with 50+ RPC endpoints, the 3.7x serialization speedup and 2.5x bandwidth reduction justify the investment.

Protocol	Best For	Avoid When
HTTP/1.1	Legacy clients, simple proxies	High-concurrency, mobile users
HTTP/2	Browser-facing APIs, mixed resource pages	Single large transfers, UDP-blocked networks
HTTP/3	Mobile users, lossy networks, global users	Internal DC traffic (0.5ms RTT, no loss)
gRPC	Internal services, streaming, high-throughput	Public APIs, browser clients, debugging ease

The content platform uses all four: HTTP/3 at the CDN edge for browsers, HTTP/2 between CDN and origin, gRPC between internal services, and HTTP/1.1 only for health check endpoints consumed by legacy monitoring.