Skip to main content
ship before you scale

Database Query Analysis and Index Tuning

4 min read Chapter 32 of 42

Database Query Analysis and Index Tuning

The Feature

Every slow query in Marketflow is identified using PostgreSQL’s EXPLAIN ANALYZE. Indexes are added based on actual query patterns, not guesses. The developer can read an execution plan and determine whether the query needs an index, a rewrite, or both.

The Decision

PostgreSQL’s query planner is sophisticated. It chooses the best execution strategy (sequential scan, index scan, bitmap scan) based on table statistics. Adding an index only helps if the query planner decides to use it. EXPLAIN ANALYZE reveals the planner’s choice and the actual execution time for each step.

The Implementation

Using EXPLAIN ANALYZE

-- Run this in psql, Supabase SQL editor, or through SQLAlchemy
EXPLAIN ANALYZE
SELECT v.id, v.name, v.email, a.status
FROM vendors v
JOIN applications a ON a.vendor_id = v.id
WHERE a.market_id = '550e8400-e29b-41d4-a716-446655440000'
  AND a.status = 'pending'
ORDER BY a.created_at DESC;

Reading the output:

Sort  (cost=45.12..45.15 rows=12 width=120) (actual time=15.234..15.238 rows=12 loops=1)
  Sort Key: a.created_at DESC
  ->  Hash Join  (cost=12.50..44.87 rows=12 width=120) (actual time=8.456..15.201 rows=12 loops=1)
        Hash Cond: (v.id = a.vendor_id)
        ->  Seq Scan on vendors v  (cost=0.00..28.50 rows=850 width=80) (actual time=0.015..2.345 rows=850 loops=1)
        ->  Hash  (cost=12.38..12.38 rows=12 width=56) (actual time=5.678..5.680 rows=12 loops=1)
              ->  Seq Scan on applications a  (cost=0.00..12.38 rows=12 width=56) (actual time=0.012..5.623 rows=12 loops=1)
                    Filter: ((market_id = '550e...'::uuid) AND (status = 'pending'))
                    Rows Removed by Filter: 488

The key indicators:

  • Seq Scan on applications: Sequential scan means PostgreSQL reads every row in the table. With 500 rows, this is 5 milliseconds. With 50,000 rows, it becomes 500 milliseconds.
  • Rows Removed by Filter: 488: Out of 500 rows, 488 were read and discarded. An index on (market_id, status) would read only the 12 matching rows.
  • actual time=15.234: Total query time in milliseconds.

Adding the Right Index

# backend/alembic/versions/add_application_indexes.py
"""Add indexes for application queries."""

from alembic import op


def upgrade() -> None:
    # Composite index for the most common application query
    op.create_index(
        "ix_applications_market_status_created",
        "applications",
        ["market_id", "status", "created_at"],
    )

    # Index for vendor lookup
    op.create_index(
        "ix_applications_vendor_id",
        "applications",
        ["vendor_id"],
    )


def downgrade() -> None:
    op.drop_index("ix_applications_market_status_created")
    op.drop_index("ix_applications_vendor_id")

After the index, the query plan changes:

Sort  (cost=8.45..8.48 rows=12 width=120) (actual time=0.234..0.238 rows=12 loops=1)
  Sort Key: a.created_at DESC
  ->  Nested Loop  (cost=0.56..8.20 rows=12 width=120) (actual time=0.045..0.201 rows=12 loops=1)
        ->  Index Scan using ix_applications_market_status_created on applications a
              (cost=0.28..4.10 rows=12 width=56) (actual time=0.023..0.089 rows=12 loops=1)
              Index Cond: ((market_id = '550e...'::uuid) AND (status = 'pending'))
        ->  Index Scan using vendors_pkey on vendors v
              (cost=0.28..0.34 rows=1 width=80) (actual time=0.008..0.008 rows=1 loops=12)
              Index Cond: (id = a.vendor_id)

Query time dropped from 15 ms to 0.2 ms. The planner switched from a hash join with sequential scans to nested loop with index scans.

Partial Indexes

When queries always filter by a specific value, a partial index is smaller and faster:

# Index only pending applications (the ones organizers query most)
op.create_index(
    "ix_applications_pending",
    "applications",
    ["market_id", "created_at"],
    postgresql_where="status = 'pending'",
)

This index covers only rows where status = 'pending'. If 80% of applications are accepted or rejected, the partial index is five times smaller than a full index.

Common Marketflow Query Patterns

Query PatternOptimal Index
Pending applications for a market(market_id, status, created_at)
Vendor’s application history(vendor_id, created_at DESC)
Markets by city and state(city, state)
Active markets for listing page(status) WHERE status = 'active'
Vendor search by name(lower(name) text_pattern_ops)

The Trap

-- TRAP: Adding an index on every column
CREATE INDEX ix_applications_market_id ON applications(market_id);
CREATE INDEX ix_applications_status ON applications(status);
CREATE INDEX ix_applications_created_at ON applications(created_at);
CREATE INDEX ix_applications_vendor_id ON applications(vendor_id);
-- 4 separate indexes. PostgreSQL can use at most one per query step.
-- The query WHERE market_id = ? AND status = ? uses only one index.

-- SAFE: One composite index for the actual query pattern
CREATE INDEX ix_applications_market_status ON applications(market_id, status);
-- Covers both WHERE market_id = ? and WHERE market_id = ? AND status = ?

A composite index on (market_id, status) serves queries that filter by market_id alone and queries that filter by both market_id and status. Two single-column indexes cannot be combined as efficiently. The composite index uses approximately the same disk space as two single-column indexes but serves both query patterns.

The Cost

ItemCost
EXPLAIN ANALYZE$0 (built into PostgreSQL)
IndexesNegligible disk space
Write overhead per index~5-10% slower inserts

Indexes make reads faster and writes slightly slower. For Marketflow, reads outnumber writes by approximately 100:1 (one application submission generates hundreds of dashboard views). The trade-off overwhelmingly favors indexes.