Data Systems from the Ground Up
Data Systems from the Ground Up
Storage, Movement, and Networks for Engineers Who Need to Know Why.
This book targets senior developers who use databases and message queues daily but treat them as black boxes. You know what a SELECT statement is. You know what a queue is. You ship production systems. This book opens the black boxes one layer at a time.
Every chapter uses the same domain: a real-time logistics platform. Package tracking events stream in from delivery drivers. Warehouse inventory changes with every scan. Route assignment algorithms read and write under concurrent load. Audit history accumulates and must be queried months later. The domain is concrete enough to stress every storage and network pattern in the book, and simple enough that no chapter wastes time explaining the business logic.
Three positions run through every chapter:
Understanding the storage engine beneath your abstraction is not optional for senior engineers. ORMs, managed databases, and cloud queues hide complexity that surfaces under load, during failures, and at schema migration time. The engineer who understands WAL, compaction, and replication lag debugs production incidents in minutes. The one who does not spends hours.
Start simple, stay honest. Every concept is introduced through the simplest possible implementation first: a flat file, a linear scan, a single-node log. Complexity is added only when the simple version breaks in a measurable way. This is not pedagogy for beginners. It is the fastest path to genuine understanding for experienced engineers who skipped the foundations.
Theory exists to explain observable behavior. Raft is not covered to prepare the reader for a distributed systems PhD. It is covered because understanding leader election explains why your database goes read-only for 10 seconds during a failover. Theory without an observable consequence is cut.
Code examples use Java 21 sparingly, only where Java clarifies a data or network concept. PostgreSQL is the primary relational database. Redis handles caching. Kafka handles event streaming. RabbitMQ handles message queuing. Debezium captures changes. gRPC and Protocol Buffers handle binary protocols. RocksDB demonstrates LSM-tree internals. Every chapter follows the same structure: the black box (what the abstraction hides), the mechanism (how it actually works), the observable consequence (what you see in production), the code or config (the minimal artifact that makes it visible), and the decision rule (when to choose this approach over the alternative).
This book was generated using AI assistance.