Unbounded Growth: The Outlier Pattern
Unbounded Growth: The Outlier Pattern
The social activity platform stores user activity feeds. Each user document contains an array of recent activities: posts, likes, comments, shares. The schema looks clean:
{
_id: "user-00042",
name: "Alice",
activities: [
{ type: "post", ts: ISODate("..."), content: "..." },
{ type: "like", ts: ISODate("..."), targetId: "post-123" },
// ... more activities
]
}
For 99% of users, this array contains 50-500 entries and the document is 20-200 KB. For 1% of users (power users, bots, automated systems), the array grows to 50,000 entries. Those documents approach or hit the 16 MB BSON document size limit.
The Problem with Unbounded Arrays
Three consequences of unbounded array growth:
1. The 16 MB wall. MongoDB enforces a hard 16 MB limit on BSON document size. When a $push would cause the document to exceed 16 MB, the operation fails with error code 10334: BSONObjectTooLarge. The application receives a write error, and the user’s new activity is lost.
2. Write amplification. As covered in CH7, every $push to a large array may trigger a document relocation. A 10 MB document being relocated means 10 MB of I/O for a single array append.
3. Read amplification. Every query that touches the user document loads the entire document into WiredTiger cache. Reading Alice’s profile name loads her 10 MB activity array into cache, evicting other data. This is the opposite of what the cache should do.
The Detection
Run this aggregation to find documents with unbounded arrays:
db.users.aggregate([
{ $project: {
name: 1,
activityCount: { $size: "$activities" },
estimatedSize: { $bsonSize: "$$ROOT" }
}},
{ $match: { activityCount: { $gt: 1000 } } },
{ $sort: { activityCount: -1 } },
{ $limit: 100 }
])
If this returns documents with activityCount above 1,000, you have an unbounded growth problem.
The Outlier Pattern
The outlier pattern keeps the common case fast and handles the edge case separately. The primary document stores up to a threshold (1,000 activities). When the threshold is exceeded, new activities go into overflow documents.
// FAST: Outlier pattern implementation
public void addActivity(String userId, Activity activity) {
Document userDoc = users.find(Filters.eq("_id", userId)).first();
int currentCount = userDoc.getInteger("activityCount", 0);
if (currentCount < 1000) {
// Common path: push to primary document
users.updateOne(
Filters.eq("_id", userId),
Updates.combine(
Updates.push("activities", activityToDocument(activity)),
Updates.inc("activityCount", 1)
)
);
} else {
// Overflow path: write to overflow collection
int page = currentCount / 1000;
overflowActivities.updateOne(
Filters.and(
Filters.eq("userId", userId),
Filters.eq("page", page)
),
Updates.combine(
Updates.push("activities", activityToDocument(activity)),
Updates.inc("count", 1),
Updates.setOnInsert("userId", userId),
Updates.setOnInsert("page", page)
),
new UpdateOptions().upsert(true)
);
// Mark primary document as having overflow
users.updateOne(
Filters.eq("_id", userId),
Updates.combine(
Updates.set("hasOverflow", true),
Updates.inc("activityCount", 1),
Updates.set("overflowPages", page + 1)
)
);
}
}