Skip to main content
fast by design

Write Performance: Batch Inserts, COPY, WAL Pressure, and the Commit Rate That Saturates Disk

8 min read Chapter 55 of 90

Write Performance: Batch Inserts, COPY, WAL Pressure, and the Commit Rate That Saturates Disk

Every read optimization in Chapters 17 and 18 assumed the data was already in the database. Getting it there efficiently is a different problem. The content platform ingests 10,000 articles per hour from upstream feeds. Each article arrives with metadata, body text, and embedding vectors. The naive approach (one INSERT per article, one COMMIT per INSERT) saturates disk I/O at 2,000 articles/hour. The optimized pipeline handles 10x that load at 30% disk utilization.

This chapter measures the write path from application to disk, layer by layer.

The Write Path: From INSERT to Disk

A single INSERT statement triggers this sequence:

  1. Parse and plan the SQL
  2. Acquire a row-exclusive lock on the table
  3. Write the tuple to shared_buffers (heap page)
  4. Write a WAL record to wal_buffers
  5. On COMMIT: flush wal_buffers to disk (fsync)
  6. Return success to the client

Steps 1 through 4 are fast: in-memory operations measured in microseconds. Step 5 is the bottleneck. An fsync to NVMe takes 50-200 microseconds. To spinning disk: 2-10 milliseconds. Every COMMIT pays this cost.

Benchmark Setup

The test table mirrors the content platform’s article storage:

CREATE TABLE articles (
    id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    external_id     VARCHAR(64) NOT NULL UNIQUE,
    title           VARCHAR(512) NOT NULL,
    body            TEXT NOT NULL,
    category_id     INTEGER NOT NULL,
    source_id       INTEGER NOT NULL,
    published_at    TIMESTAMPTZ NOT NULL,
    ingested_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    word_count      INTEGER NOT NULL,
    embedding       VECTOR(384)
);

Hardware: 4 vCPU, 16GB RAM, NVMe SSD (IOPS: 64k read, 32k write). PostgreSQL 16 with default settings initially.

Method 1: Single INSERT, Single COMMIT

// SLOW: 1 INSERT + 1 COMMIT per article
public int insertOneByOne(Connection conn, List<Article> articles) throws SQLException {
    conn.setAutoCommit(true); // implicit commit per statement
    String sql = """
        INSERT INTO articles (external_id, title, body, category_id, source_id,
                             published_at, word_count, embedding)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """;
    int count = 0;
    try (PreparedStatement ps = conn.prepareStatement(sql)) {
        for (Article a : articles) {
            ps.setString(1, a.externalId());
            ps.setString(2, a.title());
            ps.setString(3, a.body());
            ps.setInt(4, a.categoryId());
            ps.setInt(5, a.sourceId());
            ps.setObject(6, a.publishedAt());
            ps.setInt(7, a.wordCount());
            ps.setObject(8, a.embedding());
            ps.executeUpdate();
            count++;
        }
    }
    return count;
}

Benchmark with 10,000 articles:

Method: single INSERT, autocommit=true
Rows:   10,000
Time:   14.2 seconds
Rate:   704 rows/sec
WAL generated: 312 MB
fsync calls:   10,000
Disk util:     87%

Each of the 10,000 commits triggers an fsync. The disk spends most of its time flushing WAL records one at a time.

Method 2: Multi-Row INSERT with Batch COMMIT

// FAST: batch INSERT with periodic COMMIT
public int insertBatched(Connection conn, List<Article> articles, int batchSize)
        throws SQLException {
    conn.setAutoCommit(false);
    String sql = """
        INSERT INTO articles (external_id, title, body, category_id, source_id,
                             published_at, word_count, embedding)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """;
    int count = 0;
    try (PreparedStatement ps = conn.prepareStatement(sql)) {
        for (Article a : articles) {
            ps.setString(1, a.externalId());
            ps.setString(2, a.title());
            ps.setString(3, a.body());
            ps.setInt(4, a.categoryId());
            ps.setInt(5, a.sourceId());
            ps.setObject(6, a.publishedAt());
            ps.setInt(7, a.wordCount());
            ps.setObject(8, a.embedding());
            ps.addBatch();
            count++;
            if (count % batchSize == 0) {
                ps.executeBatch();
                conn.commit();
            }
        }
        ps.executeBatch();
        conn.commit();
    }
    return count;
}

Benchmark with batch_size = 1000:

Method: batched INSERT, commit every 1000 rows
Rows:   10,000
Time:   1.8 seconds
Rate:   5,556 rows/sec
WAL generated: 298 MB
fsync calls:   10
Disk util:     23%

The fsync count dropped from 10,000 to 10. WAL volume is similar (the data is the same size), but grouping commits amortizes the fsync cost across 1000 rows each.

Method 3: COPY Protocol

// FAST: PostgreSQL COPY protocol for maximum throughput
public long insertWithCopy(Connection conn, List<Article> articles) throws SQLException {
    PGConnection pgConn = conn.unwrap(PGConnection.class);
    String copySql = """
        COPY articles (external_id, title, body, category_id, source_id,
                      published_at, word_count, embedding)
        FROM STDIN WITH (FORMAT csv, NULL '')
        """;
    CopyManager copyManager = pgConn.getCopyAPI();
    try (var writer = new StringWriter(articles.size() * 2048)) {
        for (Article a : articles) {
            writer.write(escapeCsv(a.externalId()));
            writer.write(',');
            writer.write(escapeCsv(a.title()));
            writer.write(',');
            writer.write(escapeCsv(a.body()));
            writer.write(',');
            writer.write(String.valueOf(a.categoryId()));
            writer.write(',');
            writer.write(String.valueOf(a.sourceId()));
            writer.write(',');
            writer.write(a.publishedAt().toString());
            writer.write(',');
            writer.write(String.valueOf(a.wordCount()));
            writer.write(',');
            writer.write(formatVector(a.embedding()));
            writer.write('\n');
        }
        byte[] data = writer.toString().getBytes(StandardCharsets.UTF_8);
        return copyManager.copyIn(copySql, new ByteArrayInputStream(data));
    } catch (IOException e) {
        throw new SQLException("COPY failed", e);
    }
}

Benchmark:

Method: COPY protocol
Rows:   10,000
Time:   0.9 seconds
Rate:   11,111 rows/sec
WAL generated: 285 MB
fsync calls:   1
Disk util:     11%

COPY bypasses the SQL parser for each row, skips the per-statement planning overhead, and uses a streaming protocol that packs data more efficiently into WAL records. A single commit at the end means a single fsync.

EXPLAIN ANALYZE: Where Time Goes

For the single INSERT case, pg_stat_statements reveals:

SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
WHERE query LIKE '%INSERT INTO articles%'
ORDER BY total_exec_time DESC;
                    query                     | calls | total_exec_time | mean_exec_time | rows
----------------------------------------------+-------+-----------------+----------------+------
 INSERT INTO articles (...) VALUES ($1,...)   | 10000 |     12847.32 ms |       1.28 ms  | 10000

1.28ms per INSERT when autocommit is on. With batch commit (1000 rows per commit), the mean drops to 0.18ms per row because the fsync is amortized.

The Write Performance Hierarchy

Write Performance Comparison

The throughput difference between methods:

MethodRows/secfsync CallsDisk Util
Single INSERT + autocommit70410,00087%
Batch INSERT (100 rows/commit)3,84610041%
Batch INSERT (1000 rows/commit)5,5561023%
COPY (single transaction)11,111111%

Unlogged Tables for Staging

The content platform uses a two-phase ingestion: stage into an unlogged table, then move to the permanent table. Unlogged tables skip WAL entirely:

CREATE UNLOGGED TABLE articles_staging (LIKE articles INCLUDING ALL);
// FAST: COPY into unlogged staging, then INSERT...SELECT into permanent table
public void ingestWithStaging(Connection conn, List<Article> articles) throws SQLException {
    conn.setAutoCommit(false);

    // Phase 1: COPY into unlogged staging (no WAL)
    long staged = copyIntoStaging(conn, articles);

    // Phase 2: Deduplicate and move to permanent table
    try (Statement stmt = conn.createStatement()) {
        stmt.execute("""
            INSERT INTO articles (external_id, title, body, category_id, source_id,
                                 published_at, word_count, embedding)
            SELECT s.external_id, s.title, s.body, s.category_id, s.source_id,
                   s.published_at, s.word_count, s.embedding
            FROM articles_staging s
            WHERE NOT EXISTS (
                SELECT 1 FROM articles a WHERE a.external_id = s.external_id
            )
            ON CONFLICT (external_id) DO NOTHING
            """);
        stmt.execute("TRUNCATE articles_staging");
    }
    conn.commit();
}

Staging throughput:

Phase 1 (COPY into unlogged): 22,222 rows/sec (0 WAL)
Phase 2 (INSERT...SELECT):     8,333 rows/sec (WAL for permanent table)
Combined pipeline:            6,250 rows/sec effective

The staging approach adds deduplication logic at the cost of a two-phase process. For the content platform, this eliminates duplicate articles from overlapping feed windows.

WAL Pressure Under Load

Monitoring WAL generation rate during ingestion:

SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') AS total_wal_bytes;

-- Check WAL generation rate over 10 seconds
SELECT pg_wal_lsn_diff(
    pg_current_wal_lsn(),
    (SELECT pg_current_wal_lsn() FROM pg_sleep(10))
) / 10.0 AS wal_bytes_per_sec;

During single-INSERT ingestion at 704 rows/sec, WAL generation runs at 22 MB/sec. During COPY ingestion at 11,111 rows/sec, WAL generation peaks at 31 MB/sec. The COPY method moves 15x more rows with only 40% more WAL because it packs records more efficiently.

When WAL generation exceeds the disk’s sequential write bandwidth, PostgreSQL stalls in WalWriterMain and the wal_writer_delay becomes the bottleneck. On NVMe with 2 GB/sec sequential write, this threshold is far above typical workloads. On spinning disk with 150 MB/sec, bulk loads can hit it.

The Content Platform Pipeline: 10k Articles/Hour

The final ingestion pipeline combines all optimizations:

public class ArticleIngestionPipeline {
    private static final int BATCH_SIZE = 500;
    private static final int STAGING_THRESHOLD = 5000;

    public IngestionResult ingest(DataSource ds, Stream<Article> articles) {
        var batch = new ArrayList<Article>(BATCH_SIZE);
        var result = new IngestionResult();

        articles.forEach(article -> {
            batch.add(article);
            if (batch.size() >= BATCH_SIZE) {
                if (batch.size() >= STAGING_THRESHOLD) {
                    result.add(ingestViaStaging(ds, batch));
                } else {
                    result.add(ingestViaCopy(ds, batch));
                }
                batch.clear();
            }
        });

        if (!batch.isEmpty()) {
            result.add(ingestViaCopy(ds, batch));
        }
        return result;
    }
}

Production metrics at 10k articles/hour steady state:

Ingestion rate:     10,000 articles/hour (2.78/sec)
Batch size:         500 articles
Commits per hour:   20
WAL generation:     4.2 GB/hour
Disk utilization:   8%
P99 batch latency:  340ms

The pipeline runs at 8% disk utilization, leaving 92% capacity for read queries. This is the result of three optimizations stacking: COPY protocol (16x over single INSERT), batch commit (reduces fsync count to 20/hour), and streaming ingestion (no memory pressure from buffering all 10k articles).

Trade-offs

Larger batch sizes increase throughput but also increase:

  • Memory usage (buffering rows before commit)
  • Data loss window (uncommitted rows lost on crash)
  • Lock hold time (all rows in a batch hold locks until commit)

The content platform accepts a 500-row data loss window (re-fetchable from upstream) in exchange for 8x throughput improvement over single-row commits.

Batch SizeThroughputData Loss WindowMemory per Batch
1704 rows/sec02 KB
1003,846 rows/sec100 rows200 KB
5005,200 rows/sec500 rows1 MB
10005,556 rows/sec1000 rows2 MB
COPY 10k11,111 rows/sec10,000 rows20 MB

The next sections drill into the mechanics: Section 1 benchmarks each write method in isolation, and Section 2 examines WAL internals and commit frequency tuning.