Schema Validation and Runtime Guards Against Unbounded Growth
Schema Validation and Runtime Guards Against Unbounded Growth
The Symptom
Six months after deploying the outlier pattern, a new developer adds a feature that stores notification preferences in a nested array within user documents. The preferences array is unbounded. Nobody notices until a power user with 15,000 notification rules reports slow profile loads. The cycle repeats: new feature, new unbounded array, eventual performance degradation.
The Cause
Code reviews catch unbounded arrays when reviewers know to look for them. But the knowledge is tribal. There is no automated enforcement. The database accepts any document under 16 MB regardless of its internal structure.
The Benchmark
This section does not require a JMH benchmark. The fix is preventive, not reactive. Instead, measure the time to catch unbounded growth with and without validation:
| Detection method | Time to detect | Impact before detection |
|---|---|---|
| No validation | Weeks to months | Production degradation |
| Schema validation | Immediate (write rejected) | Zero |
| Application guard | Immediate (write redirected) | Zero |
| Monitoring alert | Minutes to hours | Minimal |
The Fix
Three layers of defense.
Layer 1: MongoDB JSON Schema Validation.
Apply a schema validator that limits array sizes at the database level:
// FAST: Schema validation prevents unbounded growth
ValidationOptions validationOptions = new ValidationOptions()
.validator(new Document("$jsonSchema", new Document()
.append("bsonType", "object")
.append("properties", new Document()
.append("activities", new Document()
.append("bsonType", "array")
.append("maxItems", 1000)
.append("description", "Activities array capped at 1000. Use overflow collection for power users.")
)
.append("notificationPrefs", new Document()
.append("bsonType", "array")
.append("maxItems", 100)
.append("description", "Notification preferences capped at 100.")
)
)
))
.validationLevel(ValidationLevel.STRICT)
.validationAction(ValidationAction.ERROR);
database.createCollection("users", new CreateCollectionOptions()
.validationOptions(validationOptions));
When a $push would cause the activities array to exceed 1,000 elements, MongoDB rejects the write with a validation error. The application catches this and redirects to the overflow path.
For existing collections, apply the validator with runCommand:
database.runCommand(new Document("collMod", "users")
.append("validator", new Document("$jsonSchema", new Document()
.append("bsonType", "object")
.append("properties", new Document()
.append("activities", new Document()
.append("bsonType", "array")
.append("maxItems", 1000)
)
)
))
.append("validationLevel", "moderate") // Only validate inserts and updates to validated fields
.append("validationAction", "error")
);
Use validationLevel: "moderate" for existing collections. This validates inserts and updates but does not reject existing documents that violate the schema.
Layer 2: Application-Level Guard.
Wrap array push operations in a guard that estimates document size:
// FAST: Application guard with automatic overflow routing
public class BoundedArrayGuard {
private final int maxArraySize;
private final MongoCollection<Document> primaryCollection;
private final MongoCollection<Document> overflowCollection;
public BoundedArrayGuard(
MongoCollection<Document> primary,
MongoCollection<Document> overflow,
int maxArraySize
) {
this.primaryCollection = primary;
this.overflowCollection = overflow;
this.maxArraySize = maxArraySize;
}
public void pushWithOverflow(String docId, String arrayField, Document element) {
// Atomic attempt to push within bounds
Document result = primaryCollection.findOneAndUpdate(
Filters.and(
Filters.eq("_id", docId),
Filters.expr(new Document("$lt",
List.of(new Document("$size", "$" + arrayField), maxArraySize)))
),
Updates.push(arrayField, element),
new FindOneAndUpdateOptions().returnDocument(ReturnDocument.AFTER)
);
if (result != null) {
return;
}
// Overflow: route to overflow collection
overflowCollection.updateOne(
Filters.eq("parentId", docId),
Updates.combine(
Updates.push(arrayField, element),
Updates.setOnInsert("parentId", docId)
),
new UpdateOptions().upsert(true)
);
primaryCollection.updateOne(
Filters.eq("_id", docId),
Updates.set("hasOverflow_" + arrayField, true)
);
}
}
Layer 3: Monitoring and Alerting.
Query for documents approaching size limits:
// Monitoring: find documents exceeding size thresholds
public List<Document> findOversizedDocuments(MongoCollection<Document> collection,
int sizeThresholdBytes) {
return collection.aggregate(List.of(
Aggregates.project(Projections.fields(
Projections.include("_id"),
Projections.computed("docSize", new Document("$bsonSize", "$$ROOT"))
)),
Aggregates.match(Filters.gt("docSize", sizeThresholdBytes)),
Aggregates.sort(Sorts.descending("docSize")),
Aggregates.limit(50)
)).into(new ArrayList<>());
}
Run this periodically (every hour) and alert when documents exceed 1 MB. This catches unbounded growth in any field, not just the ones you anticipated.
The Proof
| Scenario | Without guards | With guards |
|---|---|---|
| New unbounded array deployed | Detected after 3 weeks | Write rejected immediately |
| Power user hits 16 MB limit | Application error in production | Overflow route, no error |
| Gradual document growth | Undetected for months | Alert at 1 MB threshold |
| Developer adds new array field | No enforcement | Schema validation rejects if no maxItems defined |
The Trade-off
Schema validation adds 5-15% overhead to write operations because MongoDB must validate each document against the schema before accepting the write. For write-heavy workloads at 10,000 ops/sec, this translates to approximately 1ms additional latency per write. Use validationLevel: "moderate" in production to validate only fields being modified, reducing the overhead to 2-5%.
The $expr check in the application guard ($lt with $size) requires MongoDB to compute the array size on every write attempt. This is O(1) for arrays stored with internal size metadata (MongoDB tracks array size internally), but it does add a comparison operation to every findOneAndUpdate. For the common case where the array is under threshold, this is the only overhead.