Scaling Hertz: Achieving 6,000% Throughput Improvement with Eventual Consistency
These articles are AI-generated summaries. Please check the original sources for full details.
The Architecture Behind a 6,000% Throughput Improvement at Hertz
Mathew Dostal and the engineering team at Hertz replaced a 40-year-old COBOL-based core with a distributed rate engine to manage a fleet of 700,000 vehicles. The legacy system suffered from 3-minute worst-case latencies and 300 RPS limits before the overhaul. By shifting to an eventual consistency model, they successfully scaled to 10,000+ locations worldwide.
Why This Matters
The technical reality at Hertz involved 1,800 IT systems and 30 rental processing systems, where a single product change required 18 separate system updates. Following a failed $32 million engagement with Accenture that produced non-extensible code, the company committed over $400 million to a multi-year technology transformation to combat a 70% loss in corporate ground transportation market share to Uber and Lyft. This case study demonstrates that for read-heavy systems like rate shopping, abandoning ACID guarantees in favor of eventual consistency is necessary to prevent database saturation and cascading failures during peak traffic events like holiday weekends.
Key Insights
- A 6,000% throughput improvement was achieved by moving from synchronous database queries to a proactive Redis cache layer.
- The team utilized Cloudant’s Change Data Capture (CDC) stream to push updates to Redis, eliminating cache misses on rule data.
- The write path was decoupled using AWS Kinesis, handling over 2,500 pricing writes per second compared to the previous REST-bound bottleneck.
- Frequency asymmetry was identified: corporate discount codes change annually, while rate codes change thousands of times per second across 700,000 vehicles.
- P95 latency was reduced to under 30ms by grouping pre-filtered rules by location and corporate discount code into hashed key structures.
Working Examples
Redis compound key structure used to group all applicable rules, benefits, and eligibility criteria by location and corporate discount code.
LOC~RLOC~Date:RC~CarType~DiscCode
(e.g., LAX~LAT~2020-01:RC001~CCAR~D)
Practical Applications
- Use Case: Rate shopping systems using eventual consistency to serve reconnaissance data without blocking on ACID transactions. Pitfall: Using strong consistency for non-financial read paths, leading to database saturation during traffic spikes.
- Use Case: Geo-routed updates via Kinesis and Cloudant replication to ensure regional clusters (US, EU) receive localized data with sub-second latency. Pitfall: Relying on a single global database instance, causing high latency for international users.
- Use Case: Implementing connection pooling and strategic denormalization to stabilize tail latency (p95/p99) under extreme load. Pitfall: Optimizing for average latency while ignoring tail distribution, resulting in cascading failures at peak capacity.
References:
Continue reading
Next article
The Future of Software Engineering: Anthropic's Vision for AI Architecting
Related Content
Fast Eventual Consistency: Inside Corrosion, the Distributed System Powering Fly.io
Fly.io built Corrosion, a distributed system for low-latency state replication, achieving p99 latency under 1 second across 800 physical servers.
GoPdfSuit: Scaling PDF Generation to 600 Documents Per Second
GoPdfSuit achieves 600 PDFs/sec on a single node by implementing custom binary parsing and memory pooling, reducing document generation costs by 92%.
System Reliability Lessons from Nigeria's ₦1.92 Trillion Market Crash
Nigeria's stock market lost ₦1.92 trillion following a single regulatory change, offering a masterclass in single points of failure and eventual consistency.