Design an E-Commerce Website
SummaryCovers microservice decomposition for e-commerce, shopping cart with...
Covers microservice decomposition for e-commerce, shopping cart with...
Covers microservice decomposition for e-commerce, shopping cart with optimistic locking, order processing saga pattern, payment idempotency with idempotency keys, and inventory management with distributed locks.
Design an E-Commerce Website
E-commerce platforms represent one of the most common system design interview questions because they combine catalog search, transactional consistency, distributed coordination, and real-time inventory tracking into a single problem. This chapter walks through a production-grade design that handles millions of daily users, flash sales, and payment failures — all while keeping the checkout experience under 200ms.
Requirements
Functional Requirements
- Browse Products: Users view product listings with images, descriptions, prices, and reviews.
- Search: Full-text search with filters (category, price range, brand, rating) and autocomplete.
- Shopping Cart: Add, remove, and update quantities. Persist across sessions for logged-in users.
- Checkout & Payment: Collect shipping info, apply coupons, process payments via multiple gateways.
- Order Tracking: Real-time order status updates from placement through delivery.
- Reviews & Ratings: Users submit reviews with star ratings; aggregate scores display on product pages.
- Inventory Visibility: Show real-time stock availability on product pages.
Non-Functional Requirements
| Metric | Target |
|---|---|
| Product catalog size | 100M+ products |
| Daily active users | 10M DAU |
| Search latency (P99) | < 200ms |
| Checkout latency (P99) | < 500ms |
| Order/Payment consistency | Strong (ACID) |
| Catalog consistency | Eventual (seconds) |
| Availability | 99.99% uptime |
| Peak traffic | 10x during flash sales |
Capacity Estimation
Assuming 10M DAU with an average of 20 page views per user:
- Read traffic: 10M × 20 = 200M page views/day ≈ 2,300 requests/second (avg), ~23K RPS at peak.
- Search queries: ~30% of page views involve search → 60M queries/day ≈ 700 QPS avg.
- Cart operations: ~5% of users add items → 500K cart writes/day.
- Orders: ~2% conversion → 200K orders/day ≈ 2.3 orders/second avg.
- Storage: 100M products × 5KB avg metadata = 500GB product data. Images stored in object storage (S3).
- Search index: 100M documents × 2KB indexed fields = 200GB Elasticsearch index.
High-Level Design
The platform decomposes into seven microservices, each owning its data store and scaling independently.
┌──────────────┐
│ CDN / LB │
└──────┬───────┘
│
┌──────▼───────┐
│ API Gateway │
└──────┬───────┘
│
┌────────────┬───────────┼───────────┬────────────┐
│ │ │ │ │
┌─────▼─────┐ ┌───▼───┐ ┌────▼────┐ ┌────▼────┐ ┌─────▼─────┐
│ Product │ │ Cart │ │ Order │ │ Payment │ │ Inventory │
│ Catalog │ │Service│ │ Service │ │ Service │ │ Service │
└─────┬──────┘ └───┬───┘ └────┬────┘ └────┬────┘ └─────┬─────┘
│ │ │ │ │
┌─────▼──────┐ ┌───▼───┐ ┌───▼────┐ ┌────▼────┐ ┌─────▼─────┐
│Elasticsearch│ │ Redis │ │PostgreSQL│ │Payment │ │ PostgreSQL│
│+ PostgreSQL │ │ │ │ │ │ DB │ │ │
└─────────────┘ └───────┘ └────────┘ └────────┘ └───────────┘
│ │
┌─────▼──────┐ ┌──────▼──────┐
│ User │ │ Notification│
│ Service │ │ Service │
└────────────┘ └─────────────┘
- API Gateway: Routes requests, handles authentication, rate limiting, and request validation.
- Product Catalog Service: Manages product CRUD, search indexing, and category hierarchy.
- Cart Service: Stores shopping carts in Redis with session affinity.
- Order Service: Orchestrates the checkout saga and tracks order lifecycle.
- Payment Service: Integrates with external gateways; enforces idempotency.
- Inventory Service: Tracks stock levels; manages reservations with distributed locks.
- Notification Service: Sends order confirmations, shipping updates via email/SMS/push.
Deep Dive
Product Catalog
The catalog stores 100M products and serves sub-200ms search queries across dozens of filter dimensions.
Search with Elasticsearch: Each product document includes title, description, brand, category path, price, attributes, and rating. Elasticsearch handles full-text search with BM25 scoring, faceted filtering (aggregations), and autocomplete via edge n-gram tokenizers.
Category Hierarchy with Materialized Path: Categories form a tree (Electronics → Phones → Smartphones). A materialized path encoding stores the full ancestry as a string:
| Category | Path |
|---|---|
| Electronics | /1/ |
| Phones | /1/5/ |
| Smartphones | /1/5/12/ |
Querying all products under “Electronics” becomes a prefix match: WHERE category_path LIKE '/1/%'. This avoids recursive queries and indexes efficiently with a B-tree.
Product Variants: A single product (e.g., “Nike Air Max”) has variants for size, color, and material. The variant model uses a sealed interface:
public record Product(
String id,
String title,
String description,
Money price,
String categoryPath,
List<ProductVariant> variants
) {}
public sealed interface ProductVariant
permits SizeVariant, ColorVariant, BundleVariant {}
public record SizeVariant(String sku, String size, int stockCount, Money priceAdjustment)
implements ProductVariant {}
public record ColorVariant(String sku, String color, String imageUrl, int stockCount)
implements ProductVariant {}
public record BundleVariant(String sku, List<String> includedSkus, Money bundlePrice)
implements ProductVariant {}
Pattern matching on the sealed interface enables exhaustive handling when rendering variant selectors, calculating prices, or checking inventory:
static Money calculatePrice(Product product, ProductVariant variant) {
return switch (variant) {
case SizeVariant s -> product.price().add(s.priceAdjustment());
case ColorVariant c -> product.price();
case BundleVariant b -> b.bundlePrice();
};
}
Shopping Cart
Cart data lives in Redis for logged-in users and in browser localStorage for guests. This dual approach keeps guest checkout fast while persisting carts for authenticated sessions.
Merge Strategy on Login: When a guest logs in, the client sends the local cart payload. The server merges it with any existing Redis cart using “last-write-wins” per SKU — guest quantities override server quantities since the guest actively selected them.
Optimistic Locking with Version Field: Two browser tabs open on the same account can race on cart updates. Each cart entry carries a version field. Updates succeed only when the submitted version matches the stored version:
public record CartItem(String sku, int quantity, long version) {}
public record Cart(String userId, List<CartItem> items, long cartVersion) {}
public class CartService {
private final RedisTemplate<String, Cart> redis;
public Cart updateItem(String userId, String sku, int newQuantity) {
String key = "cart:" + userId;
// Retry loop for optimistic locking via Redis WATCH
while (true) {
redis.watch(key);
Cart current = redis.opsForValue().get(key);
if (current == null) {
current = new Cart(userId, new ArrayList<>(), 0);
}
Cart updated = applyUpdate(current, sku, newQuantity);
redis.multi();
redis.opsForValue().set(key, updated);
List<Object> results = redis.exec();
if (results != null) {
return updated; // Success — no concurrent modification
}
// Conflict detected — retry with fresh state
}
}
private Cart applyUpdate(Cart cart, String sku, int quantity) {
List<CartItem> updatedItems = cart.items().stream()
.map(item -> item.sku().equals(sku)
? new CartItem(sku, quantity, item.version() + 1)
: item)
.toList();
boolean found = updatedItems.stream().anyMatch(i -> i.sku().equals(sku));
if (!found && quantity > 0) {
updatedItems = new ArrayList<>(updatedItems);
updatedItems.add(new CartItem(sku, quantity, 1));
}
return new Cart(cart.userId(), updatedItems, cart.cartVersion() + 1);
}
}
Redis WATCH provides optimistic locking: if another client modifies the cart between WATCH and EXEC, the transaction aborts, and the loop retries with fresh data. This approach avoids blocking other clients while ensuring consistency.
Order Processing — Saga Pattern
A checkout touches multiple services: inventory reservation, payment processing, order confirmation, and notification. A distributed transaction across all four is impractical, so we use the saga pattern to maintain consistency through a sequence of local transactions with compensating actions on failure.
Choreography vs. Orchestration:
| Approach | Pros | Cons |
|---|---|---|
| Choreography | Services react to events independently; no single point of failure | Hard to track saga state; debugging distributed flows is complex |
| Orchestration | Central coordinator manages flow; clear visibility | Orchestrator is a single point of failure; tighter coupling |
For an e-commerce checkout, orchestration is preferred because the order flow is linear with well-defined compensation steps, and visibility into order state is critical for customer support.
Saga Steps:
Reserve Inventory → Process Payment → Confirm Order → Send Notification
↓ (fail) ↓ (fail) ↓ (fail)
Release Inventory Refund Payment Cancel Order
Sealed Interface for Saga Steps:
public sealed interface SagaStep permits
ReserveInventory, ProcessPayment, ConfirmOrder, SendNotification {}
public record ReserveInventory(String orderId, List<OrderLine> lines) implements SagaStep {}
public record ProcessPayment(String orderId, Money amount, String idempotencyKey)
implements SagaStep {}
public record ConfirmOrder(String orderId) implements SagaStep {}
public record SendNotification(String orderId, String userId) implements SagaStep {}
public sealed interface SagaResult permits Success, Failure {}
public record Success(SagaStep completedStep) implements SagaResult {}
public record Failure(SagaStep failedStep, String reason) implements SagaResult {}
Order State Machine:
CREATED → INVENTORY_RESERVED → PAYMENT_PROCESSED → CONFIRMED → SHIPPED → DELIVERED
│ │ │
└──CANCELLED◄──┘──PAYMENT_FAILED◄──┘
The orchestrator executes each step sequentially. On failure, it walks backward through completed steps, executing compensating transactions:
public class OrderSagaOrchestrator {
private final Deque<Runnable> compensations = new ArrayDeque<>();
public SagaResult executeSaga(Order order) {
try {
// Step 1: Reserve Inventory
inventoryService.reserve(order.lines());
compensations.push(() -> inventoryService.release(order.lines()));
// Step 2: Process Payment
paymentService.charge(order.id(), order.total(), order.idempotencyKey());
compensations.push(() -> paymentService.refund(order.id()));
// Step 3: Confirm Order
orderRepository.confirm(order.id());
// Step 4: Notify
notificationService.sendConfirmation(order.userId(), order.id());
return new Success(new ConfirmOrder(order.id()));
} catch (SagaStepException e) {
// Execute compensations in reverse order
while (!compensations.isEmpty()) {
compensations.pop().run();
}
return new Failure(e.failedStep(), e.getMessage());
}
}
}
Payment Integration
Payment failures and network retries can cause double charges — the most damaging bug an e-commerce platform can have. Idempotency keys prevent this.
Idempotency Keys: The client generates a UUID before initiating checkout. Every payment request includes this key. The payment service checks whether a transaction with this key already exists before charging:
public record PaymentRequest(
String orderId,
String idempotencyKey,
Money amount,
PaymentMethod method
) {}
public sealed interface PaymentMethod permits CreditCard, PayPal, BankTransfer {}
public record CreditCard(String tokenizedCard) implements PaymentMethod {}
public record PayPal(String paypalToken) implements PaymentMethod {}
public record BankTransfer(String accountRef) implements PaymentMethod {}
public class IdempotentPaymentProcessor {
private final PaymentRepository repo;
private final PaymentGateway gateway;
public PaymentResult process(PaymentRequest request) {
// Check for existing transaction with this idempotency key
var existing = repo.findByIdempotencyKey(request.idempotencyKey());
if (existing.isPresent()) {
return existing.get().toResult(); // Return cached result — no double charge
}
// Process payment through gateway
PaymentResult result = switch (request.method()) {
case CreditCard cc -> gateway.chargeCard(cc.tokenizedCard(), request.amount());
case PayPal pp -> gateway.chargePayPal(pp.paypalToken(), request.amount());
case BankTransfer bt -> gateway.initiateBankTransfer(bt.accountRef(), request.amount());
};
// Persist the result keyed by idempotency key
repo.save(new PaymentRecord(
request.orderId(),
request.idempotencyKey(),
result.status(),
request.amount()
));
return result;
}
}
The findByIdempotencyKey query uses a UNIQUE index on the idempotency key column. If two requests arrive simultaneously with the same key, the database’s unique constraint ensures only one INSERT succeeds; the other receives the cached result.
Payment Gateway Abstraction: The sealed interface PaymentMethod combined with pattern matching dispatches to the correct gateway without if-else chains. Adding a new payment method (e.g., crypto) requires adding a new record to the sealed interface — the compiler then flags every unhandled switch case.
Inventory Management
Overselling during flash sales destroys customer trust. The inventory service uses distributed locks and reservation TTLs to prevent this.
Distributed Lock with Redis SETNX: Before decrementing stock, the service acquires a per-SKU lock:
public class InventoryService {
private final RedisTemplate<String, String> redis;
private final InventoryRepository repo;
private static final Duration LOCK_TTL = Duration.ofSeconds(5);
private static final Duration RESERVATION_TTL = Duration.ofMinutes(15);
public boolean reserveStock(String sku, int quantity, String orderId) {
String lockKey = "lock:inventory:" + sku;
String lockValue = UUID.randomUUID().toString();
// Acquire distributed lock with TTL to prevent deadlock
Boolean acquired = redis.opsForValue()
.setIfAbsent(lockKey, lockValue, LOCK_TTL);
if (!Boolean.TRUE.equals(acquired)) {
throw new LockAcquisitionException("Cannot acquire lock for SKU: " + sku);
}
try {
int available = repo.getAvailableStock(sku);
if (available < quantity) {
return false; // Insufficient stock
}
// Decrement available, increment reserved
repo.reserveStock(sku, quantity, orderId, RESERVATION_TTL);
return true;
} finally {
// Release lock only if we still own it (compare-and-delete)
String currentValue = redis.opsForValue().get(lockKey);
if (lockValue.equals(currentValue)) {
redis.delete(lockKey);
}
}
}
public void confirmReservation(String orderId) {
repo.convertReservationToSold(orderId);
}
public void releaseExpiredReservations() {
// Scheduled task: release reservations older than RESERVATION_TTL
repo.releaseExpiredReservations(Instant.now().minus(RESERVATION_TTL));
}
}
Key design decisions:
- Lock TTL (5s): Prevents deadlocks if the service crashes while holding the lock.
- Compare-and-delete: The
finallyblock checkslockValuebefore deleting, preventing one thread from releasing another’s lock. - Reservation TTL (15 min): If a user abandons checkout, reserved stock returns to the available pool. A background job runs every minute to release expired reservations.
- Pessimistic vs. Optimistic: Per-SKU locks are pessimistic — they block concurrent reservations for the same SKU. For most products, contention is low and locks release in milliseconds. For flash-sale items with extreme contention, a queue-based approach (see Bottlenecks) replaces direct locking.
Bottlenecks & Scaling
| Bottleneck | Solution |
|---|---|
| Flash sale thundering herd | Queue-based request throttling: funnel all purchase requests for a hot SKU through a single-partition Kafka topic. A consumer dequeues and processes sequentially, eliminating lock contention. Users receive a “position in queue” response. |
| Search relevance | A/B test ranking algorithms by routing 5% of traffic to experimental Elasticsearch scoring profiles. Track click-through rate and conversion per variant. |
| Database hotspots on popular products | Cache product pages in Redis with 60s TTL. Use read replicas for product detail queries. Write-through cache invalidation on product updates. |
| Payment timeouts | Async payment confirmation: the checkout API returns “payment pending” immediately. The payment service processes asynchronously and notifies via webhook. The order service listens for the webhook to advance the saga. |
| Cart data loss | Redis persistence (AOF) with 1-second fsync. For critical carts (items in checkout), replicate to a secondary Redis instance. |
| Inventory inaccuracy | Periodic reconciliation job compares inventory DB with warehouse management system. Alert on discrepancies exceeding 1%. |
Interviewer Tips
- Start with requirements clarification: Ask about scale (users, products), consistency requirements (strong for payments? eventual for catalog?), and which features are in scope. This signals structured thinking.
- Draw the service boundaries first: Sketch the microservices and their data stores before diving into any single component. Interviewers want to see you reason about decomposition.
- Saga pattern is expected: If you propose a two-phase commit across microservices, expect pushback. Explain why sagas with compensating transactions work better for long-running checkout flows.
- Idempotency is a must-mention: Double-charging is the worst e-commerce bug. Proactively discussing idempotency keys shows production experience.
- Flash sale follow-up is common: Interviewers frequently ask “What happens during a flash sale?” Have the queue-based throttling answer ready.
- Quantify your estimates: Stating “2,300 RPS average with 23K peak” shows you can translate DAU into infrastructure requirements. Round numbers are fine — precision is not the point; the reasoning is.