The Tipping Point: The Metrics That Tell You the Cheap Stack Is Buckling and the Decision Framework for Scaling Up
The Tipping Point
Everything in this book is designed to delay this chapter. The single Hetzner VPS, the Supabase free tier, the Docker Compose deployment, the Redis cache on the same machine: all of it works beautifully at a certain scale. The question is not if the stack will buckle, but when. And when it does, the developer needs to know which component is failing, why, and what the minimum viable upgrade is.
This chapter is a diagnostic manual. Each section covers one infrastructure component, the metrics that signal it is reaching its limit, the first upgrade step (which is always the cheapest), and the next step after that.
The Feature
The developer has a checklist of metrics for each component. Each metric has a yellow threshold (investigate) and a red threshold (act now). When a threshold is crossed, the developer knows the specific upgrade path and its cost.
The Decision
Scaling is not binary. It is not “small stack” or “Kubernetes cluster.” Between those extremes are five or six incremental upgrades, each solving one bottleneck without changing the architecture. The correct approach is:
- Identify the bottleneck (which component is at its limit)
- Apply the cheapest fix (optimize code, add index, increase cache TTL)
- If the cheapest fix is not enough, apply the next upgrade (bigger VPS, managed database, separate Redis)
- Repeat
Premature upgrades waste money. An organizer paying $29/month does not justify a $200/month infrastructure bill. The infrastructure cost should be a fraction of the revenue it supports.
The Implementation
The Metrics Dashboard
| Component | Yellow Threshold | Red Threshold | First Fix | Cost |
|---|---|---|---|---|
| CPU | >70% sustained (15 min) | >90% sustained (5 min) | Optimize queries, add caching | $0 |
| Memory | >80% | >90% | Reduce cache size, optimize ORM | $0 |
| Disk | >70% | >85% | Clean logs, archive old data | $0 |
| Response time P95 | >500ms | >2s | Add indexes, fix N+1 queries | $0 |
| Error rate | >2% | >5% | Debug and fix errors | $0 |
| Database connections | >80% of pool | Pool exhausted | Increase pool size, add pgbouncer | $0 |
| VPS capacity | All of the above at limits | All optimizations exhausted | Upgrade CX22 to CX32 | +€3/month |
Component-by-Component Scaling Path
1. Database (Supabase Free → Pro → Dedicated)
Free tier limits:
- 500 MB database size
- 2 GB bandwidth
- Shared compute (variable performance)
- 50 concurrent connections
Scaling triggers:
- Database size approaching 400 MB
- Query performance degrading during peak hours
- Connection count approaching 45
First upgrade: Supabase Pro ($25/month)
- 8 GB database size
- Unlimited bandwidth
- Dedicated compute
- 100 concurrent connections
- Daily backups
Next upgrade: Supabase Pro with compute add-on ($50-75/month)
- Dedicated CPU and RAM
- Consistent query performance
- 200+ concurrent connections
# Monitor database size
async def check_database_size(db: AsyncSession) -> dict:
result = await db.execute(text("""
SELECT pg_size_pretty(pg_database_size(current_database())) as size,
pg_database_size(current_database()) as size_bytes
"""))
row = result.one()
return {
"size_human": row.size,
"size_bytes": row.size_bytes,
"limit_bytes": 500 * 1024 * 1024, # 500 MB free tier
"usage_percent": (row.size_bytes / (500 * 1024 * 1024)) * 100,
}
# Monitor connection count
async def check_connection_count(db: AsyncSession) -> dict:
result = await db.execute(text("""
SELECT count(*) as active_connections,
(SELECT setting::int FROM pg_settings
WHERE name = 'max_connections') as max_connections
FROM pg_stat_activity
"""))
row = result.one()
return {
"active": row.active_connections,
"max": row.max_connections,
"usage_percent": (row.active_connections / row.max_connections) * 100,
}
2. VPS (CX22 → CX32 → CX42)
CX22 (current): 2 vCPU, 4 GB RAM, 40 GB NVMe - €4.51/month
CX32 (next): 4 vCPU, 8 GB RAM, 80 GB NVMe - €7.49/month
CX42 (after): 8 vCPU, 16 GB RAM, 160 GB NVMe - €14.49/month
Scaling triggers:
- CPU consistently above 70% during business hours
- Memory usage above 80% with Redis and PostgreSQL competing
- Disk usage above 70%
Upgrade process:
1. Snapshot current VPS in Hetzner dashboard
2. Create new VPS from snapshot with larger plan
3. Update Cloudflare DNS to point to new VPS IP
4. Verify everything works
5. Delete old VPS
The upgrade from CX22 to CX32 doubles CPU and RAM for an additional €3/month. This buys time to find and fix the actual bottleneck (usually a slow query or missing cache) rather than throwing hardware at the problem.
3. Redis (Colocated → Separate → Managed)
Current: Redis in Docker on the same VPS
- Shares RAM with the application and database proxy
- 128 MB max memory allocation
- No persistence (cache only)
Scaling triggers:
- Cache eviction rate is high (Redis is evicting keys frequently)
- Application and Redis compete for memory
- Cache hit rate drops below 80%
First upgrade: Increase Redis memory allocation (free)
- Change maxmemory from 128 MB to 256 MB
- Reduce PostgreSQL shared_buffers if needed
Next upgrade: Separate Redis instance
- Hetzner CX11 for Redis only: €3.79/month
- 2 GB RAM dedicated to Redis
- Low latency over internal network
Next upgrade: Managed Redis (Upstash or Redis Cloud)
- Upstash free tier: 10,000 commands/day
- Upstash Pro: $10/month for 100,000 commands/day
4. File Storage (R2 Free → R2 Paid)
Current: Cloudflare R2 free tier
- 10 GB storage
- 10M Class B operations (reads)
- 1M Class A operations (writes)
Scaling triggers:
- Storage approaching 8 GB
- Operations approaching 80% of free tier limits
First upgrade: R2 paid tier (pay per use)
- $0.015/GB/month for storage above 10 GB
- At 50 GB of documents: $0.60/month
The R2 upgrade is essentially automatic and nearly free.
There is no planning needed.
Cost Projection at Growth Milestones
| Milestone | MRR | Infrastructure Cost | Margin |
|---|---|---|---|
| 0-10 customers | $0-$290 | ~€9 ($10) | 96% |
| 10-50 customers | $290-$1,450 | ~€15 ($17) | 98% |
| 50-100 customers | $1,450-$2,900 | ~€40 ($45) | 98% |
| 100-500 customers | $2,900-$14,500 | ~€100 ($112) | 99% |
The infrastructure cost column includes:
- VPS (€4.51 - €14.49)
- Supabase ($0 - $25)
- Domain ($10/year)
- Cloudflare ($0)
- Resend ($0 - $20)
- Sentry ($0)
- Grafana Cloud ($0)
At every milestone, infrastructure is less than 5% of revenue. The stack scales economically because each upgrade is incremental and triggered by actual usage, not projected usage.
The Trap
# TRAP: Scaling preemptively based on projected growth
# "We might get 10,000 users next month, so let's move to
# a managed Kubernetes cluster with auto-scaling now."
# Cost: $200-500/month for infrastructure that serves 50 users
# SAFE: Scale reactively based on measured metrics
# Check the metrics dashboard weekly
# When a yellow threshold is crossed, investigate
# When a red threshold is crossed, apply the minimum upgrade
# The CX22 handles 50 customers without any upgrades
The most expensive infrastructure decision is scaling before you need to. A Kubernetes cluster for 50 users costs more per month than the revenue those users generate. The metrics-driven approach in this chapter ensures every dollar spent on infrastructure is justified by actual load, not projected load.
The Cost
This chapter costs nothing to implement. It is a decision framework. The cost comes when the thresholds are crossed:
| Upgrade | Trigger | Monthly Cost |
|---|---|---|
| CX22 → CX32 | CPU/RAM at limits | +€3 |
| Supabase Free → Pro | 400+ MB database | +$25 |
| Separate Redis VPS | Memory contention | +€3.79 |
| Resend Free → Pro | 100+ emails/day | +$20 |
The maximum monthly infrastructure cost at 500 customers is approximately $112. The revenue at 500 customers ($14,500/month) provides a 99% margin. The cheap stack is not a temporary compromise. It is a permanent architecture that scales further than most developers expect.