Skip to main content
the lies your orm tells you

Auditing Schema Drift in Existing Systems

5 min read Chapter 15 of 30

Auditing Schema Drift in Existing Systems

You inherited a system that has been running ddl-auto=update for two years. The schema is drifted. You need to quantify the damage and build a migration path without breaking production.

The Lie

The entity model is the source of truth for your schema.

The Reality

After months of ddl-auto=update, neither the entity model nor the database is the source of truth. The entity model describes what Hibernate thinks the schema should be. The database contains what the schema actually is, including ghost columns, manually added indexes, constraints that Hibernate never knew about, and type mismatches that accumulated silently.

You need to compare both and reconcile them into a single, versioned migration history.

The Evidence

Here is a real-world example of drift after 18 months of ddl-auto=update:

-- What Hibernate thinks the table looks like (from entity mapping):
-- CREATE TABLE orders (
--     id BIGINT PRIMARY KEY,
--     customer_id BIGINT NOT NULL,
--     status VARCHAR(255),
--     total_amount DECIMAL(19,2),
--     created_at TIMESTAMP
-- )

-- What the actual database contains:
\d orders
--                              Table "public.orders"
--       Column      |          Type          | Nullable |   Default
-- ------------------+------------------------+----------+-------------
--  id               | bigint                 | not null | nextval(...)
--  customer_id      | bigint                 | not null |
--  status           | character varying(255) | yes      |
--  total_amount     | numeric(19,2)          | yes      |
--  created_at       | timestamp              | yes      |
--  old_status       | character varying(50)  | yes      |    -- ghost
--  customer_name    | character varying(255) | yes      |    -- ghost
--  discount_pct     | double precision       | yes      |    -- ghost
--  legacy_ref       | character varying(100) | yes      |    -- ghost
--
-- Indexes:
--  orders_pkey PRIMARY KEY (id)
--  idx_orders_customer_manual btree (customer_id)  -- manually added, not in entity
--
-- Foreign-keys:
--  fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
--    -- added manually by DBA, not in entity mapping

Four ghost columns. One manually added index. One manually added foreign key. Hibernate does not know about any of these.

The Fix

Step 1: Detect Drift Programmatically

Use Hibernate’s SchemaValidator or Spring’s ddl-auto=validate to find mismatches between entities and schema:

// BETTER: Programmatic drift detection
@Component
public class SchemaDriftAuditor implements CommandLineRunner {

    @Autowired
    private EntityManagerFactory emf;

    @Override
    public void run(String... args) {
        SessionFactory sf = emf.unwrap(SessionFactory.class);
        Metadata metadata = ((SessionFactoryImplementor) sf)
            .getJdbcServices()
            .getBootstrapJdbcConnectionAccess()
            // ... access metadata from boot model

        // Alternative: use SchemaManagementTool
        ServiceRegistry serviceRegistry = ((SessionFactoryImplementor) sf)
            .getServiceRegistry();

        SchemaManagementTool tool = serviceRegistry
            .getService(SchemaManagementTool.class);

        // This will throw if schema doesn't match entities
        // Catch and log instead of failing
        try {
            tool.getSchemaValidator(/* tool options */);
        } catch (SchemaManagementException e) {
            log.error("Schema drift detected: {}", e.getMessage());
        }
    }
}

A more practical approach uses direct SQL to find ghost columns:

// BETTER: Query information_schema to find columns not mapped in entities
@Component
public class GhostColumnDetector {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    public List<String> findGhostColumns(String tableName,
                                          Set<String> entityColumns) {
        List<String> dbColumns = jdbcTemplate.queryForList(
            """
            SELECT column_name FROM information_schema.columns
            WHERE table_name = ? AND table_schema = 'public'
            """,
            String.class, tableName);

        return dbColumns.stream()
            .filter(col -> !entityColumns.contains(col))
            .toList();
    }
}

// Usage:
// Set<String> mapped = Set.of("id", "customer_id", "status",
//                             "total_amount", "created_at");
// List<String> ghosts = detector.findGhostColumns("orders", mapped);
// ghosts = ["old_status", "customer_name", "discount_pct", "legacy_ref"]

Step 2: Generate a Baseline Migration

Once you have audited the drift, generate a baseline migration that matches the current production schema exactly:

# Using Flyway
flyway baseline -baselineVersion=0 -baselineDescription="Existing schema"

# Or dump the current schema as your V1 migration
pg_dump --schema-only --no-owner mydb > V1__baseline.sql

Step 3: Reconcile in Phases

Do not try to fix all drift in one migration. Plan phases:

-- V2__remove_ghost_columns.sql
-- Phase 1: Remove clearly abandoned columns
-- Verify no application or report reads these columns first

ALTER TABLE orders DROP COLUMN IF EXISTS old_status;
ALTER TABLE orders DROP COLUMN IF EXISTS legacy_ref;

-- Phase 2: Formalize manually added structures
-- These were added by a DBA and are actually useful
-- Now they're tracked in migration history

-- (idx_orders_customer_manual already exists, just documenting it)
-- If it didn't exist: CREATE INDEX idx_orders_customer ON orders (customer_id);
-- V3__add_missing_constraints.sql
-- Phase 3: Add constraints that Hibernate never generated

ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders ADD CONSTRAINT chk_orders_status
    CHECK (status IN ('PENDING', 'CONFIRMED', 'SHIPPED', 'DELIVERED', 'CANCELLED'));

ALTER TABLE orders ALTER COLUMN created_at SET NOT NULL;
ALTER TABLE orders ALTER COLUMN created_at SET DEFAULT NOW();

Step 4: Switch to validate

# After baseline and reconciliation migrations are applied
spring:
  jpa:
    hibernate:
      ddl-auto: validate
  flyway:
    enabled: true
    baseline-on-migrate: true

From this point forward, every schema change goes through a Flyway migration. Hibernate validates on startup that entities match the schema. If they diverge, the application fails to start, which is the correct behavior.

The Cost Model

Schema drift auditing takes 1-3 days for a medium-sized application (20-50 entities). The reconciliation migration plan takes another 2-5 days, depending on how many ghost columns carry data that downstream systems depend on.

The cost of not doing this: every month of continued ddl-auto=update adds more ghost columns, widens the gap between entity model and database, and makes the eventual reconciliation harder. After two years, you are looking at a week-long project. After four years, it becomes a multi-sprint effort with real risk of data loss.

The right time to switch was when you went to production. The second best time is now.