Skip to main content
Mechanical-Principal-Engineer

Data Systems from the Ground Up

Data Systems from the Ground Up

Storage, Movement, and Networks for Engineers Who Need to Know Why.

This book targets senior developers who use databases and message queues daily but treat them as black boxes. You know what a SELECT statement is. You know what a queue is. You ship production systems. This book opens the black boxes one layer at a time.

Every chapter uses the same domain: a real-time logistics platform. Package tracking events stream in from delivery drivers. Warehouse inventory changes with every scan. Route assignment algorithms read and write under concurrent load. Audit history accumulates and must be queried months later. The domain is concrete enough to stress every storage and network pattern in the book, and simple enough that no chapter wastes time explaining the business logic.

Three positions run through every chapter:

Understanding the storage engine beneath your abstraction is not optional for senior engineers. ORMs, managed databases, and cloud queues hide complexity that surfaces under load, during failures, and at schema migration time. The engineer who understands WAL, compaction, and replication lag debugs production incidents in minutes. The one who does not spends hours.

Start simple, stay honest. Every concept is introduced through the simplest possible implementation first: a flat file, a linear scan, a single-node log. Complexity is added only when the simple version breaks in a measurable way. This is not pedagogy for beginners. It is the fastest path to genuine understanding for experienced engineers who skipped the foundations.

Theory exists to explain observable behavior. Raft is not covered to prepare the reader for a distributed systems PhD. It is covered because understanding leader election explains why your database goes read-only for 10 seconds during a failover. Theory without an observable consequence is cut.

Code examples use Java 21 sparingly, only where Java clarifies a data or network concept. PostgreSQL is the primary relational database. Redis handles caching. Kafka handles event streaming. RabbitMQ handles message queuing. Debezium captures changes. gRPC and Protocol Buffers handle binary protocols. RocksDB demonstrates LSM-tree internals. Every chapter follows the same structure: the black box (what the abstraction hides), the mechanism (how it actually works), the observable consequence (what you see in production), the code or config (the minimal artifact that makes it visible), and the decision rule (when to choose this approach over the alternative).

This book was generated using AI assistance.

12 Chapters
3h 1m total
36,178 words
Start Reading

About This Book

Voice Mechanical-Principal-Engineer
Tone Precise, mechanical, willing to name the wrong tool for the job. Write as a principal engineer who has debugged a PostgreSQL vacuum storm at 2am, traced a Kafka consumer lag spike to a single slow deserializer, and explained to a product manager why the database cannot just roll back a distributed transaction. Every abstraction is opened to show the mechanism inside.
Categories
Storage Databases Distributed Systems Networking Data Engineering Reliability

Table of Contents