Skip to main content
unbound mongodb at scale

Schema Validation and Runtime Guards Against Unbounded Growth

4 min read Chapter 24 of 72

Schema Validation and Runtime Guards Against Unbounded Growth

The Symptom

Six months after deploying the outlier pattern, a new developer adds a feature that stores notification preferences in a nested array within user documents. The preferences array is unbounded. Nobody notices until a power user with 15,000 notification rules reports slow profile loads. The cycle repeats: new feature, new unbounded array, eventual performance degradation.

The Cause

Code reviews catch unbounded arrays when reviewers know to look for them. But the knowledge is tribal. There is no automated enforcement. The database accepts any document under 16 MB regardless of its internal structure.

The Benchmark

This section does not require a JMH benchmark. The fix is preventive, not reactive. Instead, measure the time to catch unbounded growth with and without validation:

Detection methodTime to detectImpact before detection
No validationWeeks to monthsProduction degradation
Schema validationImmediate (write rejected)Zero
Application guardImmediate (write redirected)Zero
Monitoring alertMinutes to hoursMinimal

The Fix

Three layers of defense.

Layer 1: MongoDB JSON Schema Validation.

Apply a schema validator that limits array sizes at the database level:

// FAST: Schema validation prevents unbounded growth
ValidationOptions validationOptions = new ValidationOptions()
    .validator(new Document("$jsonSchema", new Document()
        .append("bsonType", "object")
        .append("properties", new Document()
            .append("activities", new Document()
                .append("bsonType", "array")
                .append("maxItems", 1000)
                .append("description", "Activities array capped at 1000. Use overflow collection for power users.")
            )
            .append("notificationPrefs", new Document()
                .append("bsonType", "array")
                .append("maxItems", 100)
                .append("description", "Notification preferences capped at 100.")
            )
        )
    ))
    .validationLevel(ValidationLevel.STRICT)
    .validationAction(ValidationAction.ERROR);

database.createCollection("users", new CreateCollectionOptions()
    .validationOptions(validationOptions));

When a $push would cause the activities array to exceed 1,000 elements, MongoDB rejects the write with a validation error. The application catches this and redirects to the overflow path.

For existing collections, apply the validator with runCommand:

database.runCommand(new Document("collMod", "users")
    .append("validator", new Document("$jsonSchema", new Document()
        .append("bsonType", "object")
        .append("properties", new Document()
            .append("activities", new Document()
                .append("bsonType", "array")
                .append("maxItems", 1000)
            )
        )
    ))
    .append("validationLevel", "moderate")   // Only validate inserts and updates to validated fields
    .append("validationAction", "error")
);

Use validationLevel: "moderate" for existing collections. This validates inserts and updates but does not reject existing documents that violate the schema.

Layer 2: Application-Level Guard.

Wrap array push operations in a guard that estimates document size:

// FAST: Application guard with automatic overflow routing
public class BoundedArrayGuard {
    private final int maxArraySize;
    private final MongoCollection<Document> primaryCollection;
    private final MongoCollection<Document> overflowCollection;

    public BoundedArrayGuard(
        MongoCollection<Document> primary,
        MongoCollection<Document> overflow,
        int maxArraySize
    ) {
        this.primaryCollection = primary;
        this.overflowCollection = overflow;
        this.maxArraySize = maxArraySize;
    }

    public void pushWithOverflow(String docId, String arrayField, Document element) {
        // Atomic attempt to push within bounds
        Document result = primaryCollection.findOneAndUpdate(
            Filters.and(
                Filters.eq("_id", docId),
                Filters.expr(new Document("$lt",
                    List.of(new Document("$size", "$" + arrayField), maxArraySize)))
            ),
            Updates.push(arrayField, element),
            new FindOneAndUpdateOptions().returnDocument(ReturnDocument.AFTER)
        );

        if (result != null) {
            return;
        }

        // Overflow: route to overflow collection
        overflowCollection.updateOne(
            Filters.eq("parentId", docId),
            Updates.combine(
                Updates.push(arrayField, element),
                Updates.setOnInsert("parentId", docId)
            ),
            new UpdateOptions().upsert(true)
        );

        primaryCollection.updateOne(
            Filters.eq("_id", docId),
            Updates.set("hasOverflow_" + arrayField, true)
        );
    }
}

Layer 3: Monitoring and Alerting.

Query for documents approaching size limits:

// Monitoring: find documents exceeding size thresholds
public List<Document> findOversizedDocuments(MongoCollection<Document> collection,
                                              int sizeThresholdBytes) {
    return collection.aggregate(List.of(
        Aggregates.project(Projections.fields(
            Projections.include("_id"),
            Projections.computed("docSize", new Document("$bsonSize", "$$ROOT"))
        )),
        Aggregates.match(Filters.gt("docSize", sizeThresholdBytes)),
        Aggregates.sort(Sorts.descending("docSize")),
        Aggregates.limit(50)
    )).into(new ArrayList<>());
}

Run this periodically (every hour) and alert when documents exceed 1 MB. This catches unbounded growth in any field, not just the ones you anticipated.

The Proof

ScenarioWithout guardsWith guards
New unbounded array deployedDetected after 3 weeksWrite rejected immediately
Power user hits 16 MB limitApplication error in productionOverflow route, no error
Gradual document growthUndetected for monthsAlert at 1 MB threshold
Developer adds new array fieldNo enforcementSchema validation rejects if no maxItems defined

The Trade-off

Schema validation adds 5-15% overhead to write operations because MongoDB must validate each document against the schema before accepting the write. For write-heavy workloads at 10,000 ops/sec, this translates to approximately 1ms additional latency per write. Use validationLevel: "moderate" in production to validate only fields being modified, reducing the overhead to 2-5%.

The $expr check in the application guard ($lt with $size) requires MongoDB to compute the array size on every write attempt. This is O(1) for arrays stored with internal size metadata (MongoDB tracks array size internally), but it does add a comparison operation to every findOneAndUpdate. For the common case where the array is under threshold, this is the only overhead.