Skip to main content
fast by design

Autovacuum Tuning and Bloat Prevention

14 min read Chapter 63 of 90

Autovacuum Tuning and Bloat Prevention

Section 1 explained why dead tuples accumulate. This section focuses on preventing that accumulation from becoming a problem: tuning autovacuum to keep pace with dead tuple generation, configuring fillfactor for HOT updates, and monitoring bloat in production.

Autovacuum Trigger Mechanics

Autovacuum is controlled by a launcher process that wakes every autovacuum_naptime (default: 1 minute) and checks each table:

Should vacuum table T?
  dead_tuples(T) > autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * live_tuples(T))

Should analyze table T?
  modified_tuples(T) > autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor * live_tuples(T))

Default values:

SHOW autovacuum_vacuum_threshold;       -- 50
SHOW autovacuum_vacuum_scale_factor;    -- 0.2
SHOW autovacuum_analyze_threshold;      -- 50
SHOW autovacuum_analyze_scale_factor;   -- 0.1
SHOW autovacuum_naptime;                -- 1min
SHOW autovacuum_max_workers;            -- 3

For the content platform’s tables:

Table                  | live_tuples | Vacuum Trigger (default)  | Dead Tuples at Trigger
articles               |  1,000,000  | 50 + 0.2 * 1M = 200,050  | 200,050 (too late!)
article_view_counts    |    100,000  | 50 + 0.2 * 100K = 20,050 | 20,050
view_events            | 10,000,000  | 50 + 0.2 * 10M = 2M      | 2,000,050 (way too late!)
categories             |         20  | 50 + 0.2 * 20 = 54       | 54 (fine, rarely updated)

The articles table tolerates 200,050 dead tuples before vacuum runs. At 10k articles/hour insertion rate and occasional updates, this is acceptable. But view_events with 10M rows waits until 2M dead tuples accumulate. That is catastrophic.

Per-Table Autovacuum Settings

Override global settings for tables with different update patterns:

-- High-update counter table: aggressive vacuum
ALTER TABLE article_view_counts SET (
    autovacuum_vacuum_scale_factor = 0.01,     -- 1% dead triggers vacuum
    autovacuum_vacuum_threshold = 100,          -- at least 100 dead
    autovacuum_analyze_scale_factor = 0.05,    -- reanalyze at 5% change
    autovacuum_analyze_threshold = 100,
    autovacuum_vacuum_cost_delay = 2,           -- 2ms throttle delay (default: 20ms)
    autovacuum_vacuum_cost_limit = 1000         -- higher I/O budget
);

-- Large event table: lower scale factor, still aggressive
ALTER TABLE view_events SET (
    autovacuum_vacuum_scale_factor = 0.005,    -- 0.5% = 50,000 dead tuples
    autovacuum_vacuum_threshold = 1000,
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_vacuum_cost_delay = 5,
    autovacuum_vacuum_cost_limit = 800
);

-- Articles table: moderate, mostly inserts
ALTER TABLE articles SET (
    autovacuum_vacuum_scale_factor = 0.05,     -- 5% = 50,000 dead tuples
    autovacuum_vacuum_threshold = 500,
    autovacuum_analyze_scale_factor = 0.05
);

New trigger calculations:

Table                  | Trigger After Dead Tuples | At 1000 updates/sec: Time to Trigger
article_view_counts    | 100 + 0.01 * 100K = 1100 | 1.1 seconds
view_events            | 1000 + 0.005 * 10M = 51K | 51 seconds
articles               | 500 + 0.05 * 1M = 50,500 | rarely hit (low update rate)

Cost-Based Vacuum Throttling

Vacuum uses a cost-based I/O throttling system to prevent it from overwhelming the disk:

SHOW vacuum_cost_page_hit;    -- 1 (cost for reading a page from shared_buffers)
SHOW vacuum_cost_page_miss;   -- 2 (cost for reading a page from OS cache)  
SHOW vacuum_cost_page_dirty;  -- 20 (cost for dirtying a page)
SHOW vacuum_cost_limit;       -- 200 (budget per cycle)
SHOW vacuum_cost_delay;       -- 0 (manual vacuum: no delay)
SHOW autovacuum_vacuum_cost_delay;  -- 2ms (autovacuum delay between cycles)
SHOW autovacuum_vacuum_cost_limit;  -- -1 (use vacuum_cost_limit: 200)

The throttling loop:

while (table has dead tuples):
    process pages until accumulated_cost >= cost_limit (200)
    sleep cost_delay (2ms for autovacuum)
    reset accumulated_cost

With default settings (cost_limit=200, cost_delay=2ms):

  • Each cycle processes approximately 200 pages from shared_buffers (200/1 = 200) or 100 pages from disk (200/2 = 100)
  • 500 cycles per second (1000ms / 2ms delay)
  • Throughput: approximately 100,000 pages/sec from cache, 50,000 pages/sec from disk

For the counter table (1,284 pages): vacuum completes in approximately 13ms. Fast enough to keep pace with 1000 updates/sec.

For the view_events table (125,000 pages): vacuum takes approximately 2.5 seconds with aggressive settings. At 1000 new dead tuples/sec, 2,500 dead tuples accumulate during vacuum. The table reaches steady state at approximately 53,500 dead tuples (trigger at 51K + accumulation during vacuum).

Increasing Autovacuum Workers

With 3 default workers and many tables needing vacuum:

-- If autovacuum can't keep up with multiple busy tables
ALTER SYSTEM SET autovacuum_max_workers = 5;
SELECT pg_reload_conf();

Each worker can vacuum one table at a time. With 3 workers, if article_view_counts, view_events, and articles all need vacuum simultaneously, they each get a worker. If a fourth table needs vacuum, it waits.

The cost_limit is shared across workers:

-- Default: 200 shared across all workers
-- Each worker gets: 200 / autovacuum_max_workers = 40 per worker (with 5 workers)
-- Override per table to ensure hot tables get sufficient budget:
ALTER TABLE article_view_counts SET (autovacuum_vacuum_cost_limit = 500);
-- This table's worker gets its own 500 budget, not shared

Fillfactor Optimization

Fillfactor determines the percentage of each page filled during INSERT:

-- Default fillfactor = 100 (fill pages completely)
-- Reduced fillfactor leaves space for HOT updates

-- Calculate optimal fillfactor:
-- Average tuple size after UPDATE should fit in remaining space

-- Check current tuple size
SELECT avg(pg_column_size(t.*)) AS avg_tuple_bytes
FROM article_view_counts t
LIMIT 10000;
 avg_tuple_bytes
-----------------
              72

Each page is 8192 bytes. Page overhead is 24 bytes. Usable space: 8168 bytes. At 72 bytes per tuple:

Tuples per full page (fillfactor=100): 8168 / 72 = 113 tuples
Tuples per page (fillfactor=70):       8168 * 0.70 / 72 = 79 tuples
Free space per page (fillfactor=70):   8168 * 0.30 = 2450 bytes
Updates before page full:              2450 / 72 = 34 HOT updates before needing new page

With vacuum running every 1.1 seconds and each page receiving approximately 1 update/sec at most (100,000 rows, 1000 updates/sec distributed), each page accumulates at most 1-2 dead tuples before vacuum runs. Fillfactor 70 provides far more headroom than needed. A more conservative fillfactor 85 would suffice:

-- Fillfactor 85: balance between HOT updates and space efficiency
ALTER TABLE article_view_counts SET (fillfactor = 85);
Fillfactor | Table Size | HOT Ratio | Headroom (updates before page full)
100        |    98 MB   |   55.6%   |    0 (no free space)
90         |   109 MB   |   87.2%   |    11 updates
85         |   115 MB   |   92.4%   |    17 updates
70         |   140 MB   |   94.9%   |    34 updates
50         |   196 MB   |   96.1%   |    56 updates

The sweet spot for the content platform is fillfactor 85: 92.4% HOT ratio with only 17% more storage than fillfactor 100. Below 70, the diminishing HOT improvement does not justify the storage increase.

Monitoring Autovacuum Effectiveness

Dashboard Query: Are We Keeping Up?

-- Primary monitoring query: run every 5 minutes, alert if thresholds exceeded
SELECT
    schemaname || '.' || relname AS table_name,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count,
    autoanalyze_count,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    CASE
        WHEN n_dead_tup > 0.1 * n_live_tup THEN 'WARNING: >10% dead'
        WHEN n_dead_tup > 0.05 * n_live_tup THEN 'ELEVATED: >5% dead'
        ELSE 'OK'
    END AS status
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;
      table_name       | n_live_tup | n_dead_tup | dead_pct | last_autovacuum     | status
-----------------------+------------+------------+----------+---------------------+--------
 public.view_events    |  10000000  |      48291 |     0.48 | 2026-05-29 09:42:01 | OK
 public.article_view_counts | 100000 |       1847 |     1.81 | 2026-05-29 09:42:12 | OK
 public.articles       |   1000000  |       2184 |     0.22 | 2026-05-29 09:41:47 | OK

Vacuum Duration Tracking

-- Track how long each autovacuum takes
-- Enable verbose autovacuum logging for tables > 1000 pages
ALTER SYSTEM SET log_autovacuum_min_duration = 100; -- log if vacuum takes > 100ms
SELECT pg_reload_conf();

Log output:

LOG:  automatic vacuum of table "content.public.article_view_counts":
      index scans: 1
      pages: 0 removed, 1284 remain, 0 skipped due to pins, 1142 skipped frozen
      tuples: 1847 removed, 100000 remain, 0 are dead but not yet removable
      removable cutoff: 184729847
      frozen: 0 pages from table (0 total) had 0 tuples frozen
      index scan needed: 42 pages from table (3.27% of total) had 1847 dead item identifiers removed
      I/O timings: read=0.00 write=2.14
      avg read rate: 0.000 MB/s, avg write rate: 8.472 MB/s
      buffer usage: 184 hits, 0 misses, 42 dirtied
      WAL usage: 84 records, 42 full page images, 348160 bytes
      system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.04 s

Key metrics:

  • elapsed: 0.04 s: Vacuum completed in 40ms (well within tolerance)
  • 1142 skipped frozen: Visibility map allowed skipping 89% of pages
  • 42 pages modified: Only 3.27% of pages contained dead tuples

Bloat Estimation with pgstattuple

-- Install extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- Detailed bloat analysis (WARNING: full table scan, slow on large tables)
SELECT * FROM pgstattuple('article_view_counts');
 table_len  | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
------------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
  148897792 |      100000 |   7200000 |          4.83 |             1847 |        133384  |               0.09 |  140547408 |        94.39

free_percent = 94.39%: This looks alarming but is expected with fillfactor=85. The “free space” includes the intentionally-empty portion of each page reserved for HOT updates. True bloat is measured by:

-- True bloat: actual size vs minimum possible size
SELECT
    pg_size_pretty(pg_relation_size('article_view_counts')) AS actual_size,
    pg_size_pretty((100000 * 72 + 24 * (100000 / 113))::bigint) AS minimum_size,
    round(pg_relation_size('article_view_counts')::numeric /
          ((100000 * 72 + 24 * (100000 / 113))::numeric), 2) AS actual_vs_minimum
FROM pg_class WHERE relname = 'article_view_counts';
 actual_size | minimum_size | actual_vs_minimum
-------------+--------------+-------------------
 142 MB      |    7.4 MB    |             19.19

Wait: 19x the minimum? This includes page overhead, alignment padding, tuple headers (23 bytes each), and fillfactor space. The more meaningful comparison:

-- Compare to expected size with fillfactor=85
-- Expected: 100000 tuples * 72 bytes / (8168 * 0.85) = 1037 pages = 8.1 MB (data only)
-- Plus: page headers, item pointers, alignment = approximately 12 MB
-- Plus: fillfactor overhead (15%): approximately 14 MB
-- Actual: 142 MB... something is wrong.

-- The table was created before fillfactor change. VACUUM does not shrink files.
-- Need VACUUM FULL or pg_repack to reclaim space.

Emergency Bloat Remediation

When a table has grown far beyond its working size:

Option 1: VACUUM FULL (Locking)

-- WARNING: Exclusive lock on table for duration (minutes for large tables)
VACUUM FULL article_view_counts;
Before: 892 MB (9.1x bloated)
After:  115 MB (fillfactor 85 applied)
Lock duration: 4.2 seconds

VACUUM FULL rewrites the entire table, applying the current fillfactor. All indexes are rebuilt. The exclusive lock blocks all reads and writes.

Option 2: pg_repack (Online)

-- Install extension
CREATE EXTENSION pg_repack;

-- Repack without exclusive lock (takes share lock briefly at start and end)
-- Run from command line:
-- pg_repack --table article_view_counts --no-kill-backend content

pg_repack creates a shadow table, copies live data with new fillfactor, swaps the tables atomically. The original table remains readable during the copy.

Before: 892 MB
During repack: 892 MB + 115 MB (temporary copy)
After:  115 MB
Lock duration: < 100ms (swap only)
Total time: 12 seconds

Option 3: CREATE TABLE AS (Planned Maintenance)

-- Create new table with desired structure
BEGIN;
CREATE TABLE article_view_counts_new (LIKE article_view_counts INCLUDING ALL)
    WITH (fillfactor = 85,
          autovacuum_vacuum_scale_factor = 0.01,
          autovacuum_vacuum_threshold = 100);

INSERT INTO article_view_counts_new SELECT * FROM article_view_counts;

-- Swap tables
ALTER TABLE article_view_counts RENAME TO article_view_counts_old;
ALTER TABLE article_view_counts_new RENAME TO article_view_counts;
DROP TABLE article_view_counts_old;
COMMIT;

This approach requires a brief window where the table name does not exist (between RENAME operations). Use within a transaction to make it atomic.

Autovacuum Tuning Methodology

Step-by-step process for tuning a table:

Step 1: Characterize the Workload

-- Measure update rate over 1 hour
SELECT relname, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd
FROM pg_stat_user_tables
WHERE relname = 'article_view_counts';

-- Wait 1 hour, measure again
-- Delta gives updates/hour

Step 2: Calculate Acceptable Dead Tuple Level

Target: dead tuples never exceed 5% of live tuples
Live tuples: 100,000
Max acceptable dead: 5,000

Update rate: 1000/sec
Time to accumulate 5000 dead: 5 seconds
Vacuum must complete within 5 seconds of triggering

Step 3: Set Scale Factor and Threshold

-- Trigger vacuum well before 5% dead
-- Target: trigger at 1% dead = 1000 dead tuples
-- scale_factor * live_tuples + threshold = 1000
-- 0.01 * 100000 + 0 = 1000
-- But add threshold for safety:
ALTER TABLE article_view_counts SET (
    autovacuum_vacuum_scale_factor = 0.005,  -- 500 from scale factor
    autovacuum_vacuum_threshold = 500         -- plus 500 fixed
);
-- Total trigger: 500 + 500 = 1000 dead tuples

Step 4: Tune Vacuum Speed

-- Ensure vacuum can process entire table within deadline
-- Table: 1284 pages
-- Default: 200 cost_limit / 2ms delay = 100 pages/cycle, 500 cycles/sec = 50,000 pages/sec
-- Time to vacuum 1284 pages: 1284/50000 = 26ms
-- Well under 5-second deadline

-- But if table grows to 100,000 pages:
-- Time: 100000/50000 = 2 seconds
-- Still acceptable. Keep defaults unless table grows larger.

Step 5: Configure Fillfactor for HOT

-- Check HOT ratio before tuning
SELECT n_tup_hot_upd::float / nullif(n_tup_upd, 0) AS hot_ratio
FROM pg_stat_user_tables
WHERE relname = 'article_view_counts';

-- If hot_ratio < 0.9: consider reducing fillfactor
-- Current tuple size: 72 bytes
-- Page free space needed for 1 in-place update: 72 bytes
-- Fillfactor 85 gives: 8168 * 0.15 = 1225 bytes free = 17 updates worth
-- More than enough for typical vacuum cycle
ALTER TABLE article_view_counts SET (fillfactor = 85);

Step 6: Validate After Tuning

-- Wait 1 hour, then check effectiveness
SELECT
    relname,
    n_dead_tup,
    n_tup_hot_upd::float / nullif(n_tup_upd, 0) AS hot_ratio,
    autovacuum_count,
    last_autovacuum,
    pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM pg_stat_user_tables s
JOIN pg_class c ON c.relname = s.relname
WHERE s.relname = 'article_view_counts';

Expected after tuning:

 relname              | n_dead_tup | hot_ratio | autovacuum_count | size
----------------------+------------+-----------+------------------+--------
 article_view_counts  |       847  |      0.92 |             3247 | 115 MB

847 dead tuples (< 1% of live), 92% HOT ratio, vacuum running frequently (3247 times in the measurement period), stable table size.

The Content Platform Autovacuum Configuration

Complete configuration applied in production:

-- Global settings (postgresql.conf)
ALTER SYSTEM SET autovacuum_naptime = '30s';         -- check every 30s (default 1min)
ALTER SYSTEM SET autovacuum_max_workers = 4;         -- 4 parallel workers (default 3)
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 2;   -- 2ms (default 2ms in PG16+)
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 400; -- higher budget shared
ALTER SYSTEM SET log_autovacuum_min_duration = 100;  -- log vacuums > 100ms
SELECT pg_reload_conf();

-- Per-table: high-update counter table
ALTER TABLE article_view_counts SET (
    fillfactor = 85,
    autovacuum_vacuum_scale_factor = 0.005,
    autovacuum_vacuum_threshold = 500,
    autovacuum_vacuum_cost_delay = 2,
    autovacuum_vacuum_cost_limit = 500,
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_analyze_threshold = 500
);

-- Per-table: large event table (append-mostly, occasional deletes)
ALTER TABLE view_events SET (
    autovacuum_vacuum_scale_factor = 0.005,
    autovacuum_vacuum_threshold = 5000,
    autovacuum_vacuum_cost_delay = 5,
    autovacuum_vacuum_cost_limit = 800
);

-- Per-table: articles (mostly inserts, rare updates)
-- Default settings are fine for this table

Monitoring Automation

Alerting query to run every 5 minutes:

-- Alert conditions:
-- 1. Any table with > 10% dead tuples
-- 2. Any table where autovacuum hasn't run in > 1 hour (and has > 1000 dead tuples)
-- 3. Any table approaching freeze threshold

SELECT
    'BLOAT_WARNING' AS alert_type,
    relname,
    n_dead_tup,
    round(100.0 * n_dead_tup / nullif(n_live_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 0.1 * n_live_tup
  AND n_live_tup > 1000

UNION ALL

SELECT
    'VACUUM_STALLED' AS alert_type,
    relname,
    n_dead_tup,
    extract(epoch FROM now() - last_autovacuum) / 3600 AS hours_since_vacuum
FROM pg_stat_user_tables
WHERE last_autovacuum < now() - interval '1 hour'
  AND n_dead_tup > 1000

UNION ALL

SELECT
    'FREEZE_WARNING' AS alert_type,
    c.relname,
    age(c.relfrozenxid) AS xid_age,
    round(age(c.relfrozenxid)::numeric / 200000000 * 100, 1) AS freeze_pct
FROM pg_class c
WHERE c.relkind = 'r'
  AND age(c.relfrozenxid) > 150000000;  -- 75% of force-vacuum threshold

With this monitoring in place, the content platform catches bloat issues before they impact query performance. The combination of per-table autovacuum tuning, fillfactor for HOT updates, and proactive monitoring keeps all tables within 5% of their ideal size during steady-state operation.