Skip to main content
Engineering-Historian

Postmortem

Postmortem

The Engineering Failures That Changed How We Build Software.

This book targets any engineer who has ever followed a rule they could not explain the origin of. Never explains what a variable is. Never defines concurrency from scratch. The reader builds software professionally. This book explains where the rules they follow came from, and what it cost to learn them.

Every chapter investigates one failure. The structure is fixed: the system as its engineers understood it, the chain of events, the technical mechanism, what the official review missed, what changed as a result, and the rule that the failure produced.

Four positions run through every chapter:

Every engineering rule has a failure behind it. "Never cast between numeric types without explicit validation" is not a style preference. It is the lesson extracted from a rocket that exploded 37 seconds after launch because an Ada exception handler silently swallowed an overflow error. Rules without origin stories are forgotten under deadline pressure. Rules with a face, a sequence of events, and a consequence are remembered.

The engineers were not incompetent. This is the most important position in the book and it is stated in chapter 1 and never contradicted. Every failure investigated here was built by skilled, experienced engineers making reasonable decisions given what they knew and believed at the time. The purpose of each investigation is not to identify the person who made the mistake. It is to identify the system condition that made the mistake invisible until it was too late. Hindsight is not analysis.

Failures are not random. The same categories of failure appear across decades, industries, and technology stacks: untested assumptions about the environment, race conditions in systems where the authors believed concurrency was not a concern, implicit contracts between components that were never written down, and cost-cutting decisions that removed the redundancy that would have contained the damage. The patterns are the point.

The industry learned something from each of these. Not always the right thing. Not always quickly. But every chapter ends with a traceable line between the failure and a practice, a standard, a language feature, or a tool that exists because of it. The reader finishes each chapter understanding not just what went wrong but what changed permanently as a result.

This book was generated using AI assistance.

14 Chapters
2h 46m total
33,134 words
Start Reading

About This Book

Voice Engineering-Historian
Tone Clinical, narrative, forensic. Write as an engineering historian who is also a working engineer. Someone who reads incident reports the way a pathologist reads an autopsy: with clinical precision, genuine curiosity, and no interest in blame. The prose is narrative but the analysis is technical. When the failure mode requires understanding a scheduler, a floating point representation, or a network partition, the book explains it at the depth required to understand the failure, then moves on.
Categories
Software Engineering Incident Analysis Systems Failure Safety Engineering Engineering History

Table of Contents