Connection Overhead and PgBouncer
Connection Overhead and PgBouncer
The main chapter quantified the high-level impact: 210 direct connections consume 6.3GB while 20 pooled connections consume 0.6GB. This section dissects where that memory goes, how PgBouncer multiplexes connections, and the subtle performance implications of pooling.
PostgreSQL Backend Process Memory Map
Each backend process has private memory regions. Examining a single backend via /proc:
# Find a backend PID
psql -c "SELECT pg_backend_pid();"
# 12847
# Memory map summary
cat /proc/12847/smaps_rollup
Rss: 29184 kB
Pss: 14592 kB (proportional share, excludes shared mappings)
Private_Clean: 2048 kB
Private_Dirty: 12544 kB (this is the per-connection cost)
Shared_Clean: 14592 kB (shared_buffers, mapped but not counted per-process)
The actual per-connection cost is Private_Dirty: 12.5 MB for an idle connection. Under load with active queries:
-- Run a query that allocates work_mem
SET work_mem = '64MB';
SELECT * FROM articles ORDER BY published_at DESC LIMIT 10000;
During the sort:
Private_Dirty: 78336 kB (12.5 MB base + 64 MB work_mem)
work_mem is allocated per sort/hash operation, not per connection. A single complex query with multiple sorts can allocate work_mem multiple times:
-- This query allocates work_mem 3 times: 2 sorts + 1 hash join
EXPLAIN (ANALYZE, BUFFERS)
SELECT a.title, c.name, count(v.id)
FROM articles a
JOIN categories c ON a.category_id = c.id
JOIN view_events v ON v.article_id = a.id
GROUP BY a.title, c.name -- Hash Aggregate: 1x work_mem
ORDER BY count(v.id) DESC -- Sort: 1x work_mem
LIMIT 100;
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=18247.32..18247.57 rows=100 width=52)
(actual time=142.8..142.9 rows=100 loops=1)
-> Sort (cost=18247.32..18372.32 rows=50000 width=52)
(actual time=142.7..142.8 rows=100 loops=1)
Sort Key: (count(v.id)) DESC
Sort Method: top-N heapsort Memory: 48kB
-> HashAggregate (cost=15847.00..16347.00 rows=50000 width=52)
(actual time=128.4..137.2 rows=47832 loops=1)
Group Key: a.title, c.name
Batches: 1 Memory Usage: 8209kB
-> Hash Join (cost=4912.00..14347.00 rows=200000 width=44)
(actual time=32.1..98.7 rows=198472 loops=1)
...
Planning Time: 0.42 ms
Execution Time: 143.1 ms
Shared Buffers and Connection Count Interaction
shared_buffers is mapped into every backend’s address space but not duplicated:
SHOW shared_buffers; -- 4GB on the content platform
Per-connection private memory: 12-30 MB (depends on query complexity)
shared_buffers (shared): 4 GB (mapped once, shared by all)
Total system memory: 16 GB
Available for connections: 16 - 4 - 1 (OS) = 11 GB
Max safe connections: 11 GB / 30 MB = 366
But 366 is a theoretical maximum with all connections running sorts. Practical safe limit accounts for peak work_mem usage:
Connections * peak_work_mem < available_memory
50 * 64MB = 3.2 GB (safe, leaves 7.8 GB for OS cache)
200 * 64MB = 12.8 GB (exceeds available, will swap)
The content platform limits connections to 50 and sets work_mem = 32MB, ensuring worst case memory usage stays under 5.6 GB (50 * 32MB + 50 * 30MB base = 3.1 GB private + 4 GB shared = 7.1 GB).
PgBouncer Internal Architecture
PgBouncer is a single-threaded event loop (using libevent) that:
- Accepts client connections (TCP sockets)
- Authenticates clients against its own auth file
- Queues client requests until a server connection is available
- Assigns a server connection for the transaction duration
- Returns the server connection to the pool after COMMIT
Memory per client connection in PgBouncer: approximately 2KB (socket buffer + state). Compare to 12-30 MB per PostgreSQL backend.
1000 clients through PgBouncer:
PgBouncer memory: 2 MB (1000 * 2 KB)
PostgreSQL memory: 600 MB (20 backends * 30 MB)
Total: 602 MB
1000 clients direct to PostgreSQL:
PostgreSQL memory: 30 GB (1000 * 30 MB)
Result: System OOM or heavy swap
PgBouncer Configuration for the Content Platform
; /etc/pgbouncer/pgbouncer.ini
[databases]
; Read traffic through pooler
content_read = host=127.0.0.1 port=5432 dbname=content pool_size=20 pool_mode=transaction
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = hba
auth_file = /etc/pgbouncer/userlist.txt
auth_hba_file = /etc/pgbouncer/pg_hba.conf
; Pool sizing
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
; Client limits
max_client_conn = 1000
max_db_connections = 25
; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 30
client_login_timeout = 15
; Transaction mode settings
pool_mode = transaction
server_reset_query = DISCARD ALL
server_check_query = SELECT 1
server_check_delay = 30
; Performance
tcp_keepalive = 1
tcp_keepidle = 60
tcp_keepintvl = 10
tcp_keepcnt = 3
; Logging
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
stats_period = 60
Key settings explained:
pool_mode = transaction: Release server connection after each transactionserver_reset_query = DISCARD ALL: Reset session state between clients (plan cache, GUCs, temp tables)default_pool_size = 20: Maximum 20 server connections per database/user pairreserve_pool_size = 5: Extra connections activated when all 20 are busy for > 3 secondsmax_client_conn = 1000: Accept up to 1000 client connections (queued if no server connection available)
The DISCARD ALL Cost
When PgBouncer assigns a server connection to a new client in transaction mode, it runs server_reset_query to clean session state:
DISCARD ALL;
-- Equivalent to:
-- RESET ALL (GUC settings)
-- DEALLOCATE ALL (prepared statements)
-- CLOSE ALL (cursors)
-- UNLISTEN * (notifications)
-- SET SESSION AUTHORIZATION DEFAULT
-- RESET ROLE
This destroys the prepared statement cache. The next client must re-prepare all statements. Cost:
First query after DISCARD ALL: 5.2 ms (parse + plan + execute)
Subsequent queries (cached): 4.1 ms (execute only)
Overhead per pool rotation: 1.1 ms
With 2000 transactions/sec across 20 server connections, each connection handles 100 tx/sec. If each client does one transaction then another client gets the connection, that is 100 DISCARD ALL executions per second per connection, or 2000 total. Overhead: 2000 * 1.1ms = 2.2 seconds of CPU time per second across all connections.
Mitigation: use server_reset_query_always = 0 and ensure applications do not change session state:
; If applications never SET custom GUCs or create temp tables:
server_reset_query =
; Just reset the transaction state, skip DISCARD ALL
Benchmark with empty server_reset_query:
With DISCARD ALL: 1847 tx/sec (P99: 14.2ms)
Without DISCARD ALL: 2104 tx/sec (P99: 11.8ms)
Improvement: 14%
Prepared Statement Strategies with PgBouncer
Transaction-mode PgBouncer cannot use named prepared statements across transactions because the server connection changes. Two solutions:
Solution 1: Protocol-Level Prepared Statements (PgBouncer 1.21+)
PgBouncer 1.21 introduced prepared_statement_cache_size:
; PgBouncer tracks prepared statements and re-prepares on server connection change
prepared_statement_cache_size = 100
PgBouncer intercepts Parse messages, caches the SQL, and re-issues Parse on the server connection if the statement is not already prepared there.
Solution 2: Anonymous Prepared Statements
Use the unnamed prepared statement (empty string name), which is per-transaction:
// Works with PgBouncer transaction mode: unnamed prepared statement
// The JDBC driver uses unnamed statements when prepareThreshold > 0
// After N executions, it uses server-side prepared statements
config.addDataSourceProperty("prepareThreshold", "0"); // never use named prepared statements
Setting prepareThreshold=0 forces the JDBC driver to always use the extended query protocol without named prepared statements. This works with PgBouncer but loses plan caching.
The content platform uses Solution 1 (PgBouncer 1.21+) for read traffic and direct connections (no PgBouncer) for write traffic where plan caching matters.
Benchmarking PgBouncer Under Load
Load test with pgbench simulating the content platform read pattern:
# Direct to PostgreSQL (50 connections)
pgbench -c 50 -j 4 -T 60 -f read_article.sql -U content content
# Results:
# tps = 4,847 (excluding connections establishing)
# latency average = 10.3 ms
# latency stddev = 4.2 ms
# Through PgBouncer (50 client connections, 20 server pool)
pgbench -c 50 -j 4 -T 60 -f read_article.sql -h pgbouncer -p 6432 -U content content
# Results:
# tps = 4,612 (excluding connections establishing)
# latency average = 10.8 ms
# latency stddev = 3.8 ms
At 50 connections (within PostgreSQL’s comfortable range), PgBouncer adds 0.5ms average latency (the proxy hop) but reduces latency variance (stddev 3.8 vs 4.2ms) because fewer backends mean less context switching.
At 200 connections:
# Direct: 200 connections to PostgreSQL
pgbench -c 200 -j 8 -T 60 -f read_article.sql -U content content
# Results:
# tps = 3,891 (DEGRADED from 4,847 at 50 connections)
# latency average = 51.4 ms
# latency stddev = 28.7 ms
# Through PgBouncer: 200 clients, still 20 server connections
pgbench -c 200 -j 8 -T 60 -f read_article.sql -h pgbouncer -p 6432 -U content content
# Results:
# tps = 4,523 (stable)
# latency average = 44.2 ms
# latency stddev = 12.1 ms
At 200 direct connections, PostgreSQL’s throughput drops 20% due to memory pressure and context switching. Through PgBouncer, throughput remains stable (4,523 vs 4,612 at 50 clients) with slightly higher average latency (queueing) but much lower variance.
Monitoring PgBouncer
-- Connect to PgBouncer admin interface
psql -h localhost -p 6432 -U pgbouncer pgbouncer
-- Pool statistics
SHOW POOLS;
database | user | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | pool_mode
----------+---------+-----------+------------+-----------+---------+---------+-----------+----------+---------+-----------
content | content | 47 | 3 | 20 | 0 | 0 | 0 | 0 | 0.02 | transaction
Key metrics:
cl_waiting > 0: Clients are waiting for server connections (pool saturated)maxwait > 1.0: Clients waiting more than 1 second (increase pool_size or investigate slow transactions)sv_active = default_pool_size: All server connections in use
-- Detailed stats
SHOW STATS;
database | total_xact_count | total_query_count | total_received | total_sent | total_xact_time | total_query_time | total_wait_time | avg_xact_count | avg_query_count | avg_xact_time | avg_query_time | avg_wait_time
----------+------------------+-------------------+----------------+------------+-----------------+------------------+-----------------+----------------+-----------------+---------------+----------------+--------------
content | 2847291 | 4271948 | 1247891 KB | 8472918 KB | 11847291 ms | 9847291 ms | 284729 ms | 4712 | 7083 | 4.16 | 2.31 | 0.10
avg_wait_time = 0.10ms confirms minimal queueing. If this exceeds 5ms consistently, the pool is undersized.
Connection Warm-Up Effect
Fresh connections lack cached plans and catalog information. The first few queries on a new connection are slower:
Query on warm connection (plan cached): 3.8 ms
Query on cold connection (first execution): 5.4 ms
Second execution on same connection: 3.9 ms
With PgBouncer rotating connections between clients, plan caches are effectively shared (since the same server connection serves multiple clients). But with DISCARD ALL, plan caches are cleared on each rotation.
The content platform’s queries are repetitive (same 5-10 query patterns), so plan caching provides 30% speedup on repeat execution. The trade-off:
With DISCARD ALL (safe, clean state): 4,612 tx/sec
Without DISCARD ALL (plans cached): 5,284 tx/sec
Risk: session state leaks between clients
The content platform uses empty server_reset_query because its application code never modifies session state. Each request uses a clean transaction with explicit settings:
// Application always sets needed GUCs explicitly per transaction
try (Connection conn = readPool.getConnection()) {
conn.setReadOnly(true);
try (Statement stmt = conn.createStatement()) {
stmt.execute("SET LOCAL statement_timeout = '5s'");
}
// ... execute query ...
}
This pattern is safe without DISCARD ALL because SET LOCAL scopes to the transaction and setReadOnly is reset by connection pool checkout.