Skip to main content
postmortem

Therac-25: The Race Condition That Killed Patients

6 min read Chapter 2 of 38

Therac-25: The Race Condition That Killed Patients and the Safety Case That Made It Invisible

The System as Its Engineers Understood It

The Therac-25 is a dual-mode linear accelerator manufactured by Atomic Energy of Canada Limited (AECL). It treats cancer patients with precisely targeted radiation. In electron mode, it fires a beam of electrons directly at the tumor site. In photon mode, it fires the same electron beam at a much higher energy into a tungsten target, which converts the electrons into X-rays. The X-ray beam passes through a flattening filter that spreads the radiation evenly across the treatment field.

The distinction matters because the electron beam in photon mode operates at 25 MeV. If that beam reaches a patient without the tungsten target and flattening filter in the beam path, it delivers radiation at roughly 100 times the intended dose. The patient receives in a fraction of a second what should be spread across an entire treatment course.

The Therac-25’s predecessor, the Therac-20, operated on similar principles. The critical difference is not in the physics. It is in the safety architecture. The Therac-20 uses hardware interlocks: physical switches and sensors that independently verify the position of the tungsten target, the flattening filter, and the beam energy selector before allowing the beam to fire. These interlocks operate independently of the software. If the software commands the beam to fire with the wrong configuration, the hardware prevents it.

The Therac-25 removes these independent hardware interlocks. The engineers’ reasoning is documented. The Therac-20 software had controlled the machine reliably through thousands of treatments. The software performed the same safety checks that the hardware interlocks performed. Having both was redundant, and redundancy added cost and complexity. The safety case for the Therac-25 assumed that software, having proven reliable on the Therac-20, could be trusted as the sole safety layer.

This assumption contains a specific technical error. The Therac-20 software was never the sole safety layer. It operated in parallel with hardware interlocks that would independently catch any software failure. The software’s track record on the Therac-20 did not demonstrate that the software was safe. It demonstrated that the hardware interlocks worked. The software was never tested under the condition that it would face on the Therac-25: being the only thing between a 25 MeV electron beam and a patient.

The software itself is written in PDP-11 assembly language by a single programmer who also wrote the Therac-20 software. There is no version control system. There is no formal specification. There are no unit tests. There is no independent code review. These absences were not unusual for embedded medical device software of the era. They were the norm. The programmer was experienced and competent. The code worked for the Therac-20. It was adapted for the Therac-25 with modifications to support the new hardware configuration.

The operator interface is a VT100 terminal. The operator types the treatment parameters: mode (electron or photon), energy level, and dose. The software validates the parameters, configures the beam, and fires when the operator presses the set button. The interaction is entirely text-based. Error messages appear on screen as the word “MALFUNCTION” followed by a number. The operator manual does not explain what the numbers mean. Operators learn through experience that most malfunctions are transient and can be resolved by pressing the P key to proceed.

This is the system as its engineers understood it. The software is reliable. The hardware interlocks are redundant. The operator is trained. The machine treats cancer patients.

The Chain

Therac-25 failure chain showing the timeline from operator input to overdose delivery, with the race condition window highlighted

The diagram shows the critical race window between operator input correction and beam configuration. The time between the operator changing the mode on the VT100 terminal and the turntable completing its rotation is the window in which the software state and the hardware state diverge. The system provides no mechanism to detect this divergence because the hardware interlocks that would have detected it were removed.

The following reconstruction synthesizes the documented events at the Kennestone Regional Oncology Center in Marietta, Georgia, and the East Texas Cancer Center in Tyler, Texas, where the most thoroughly documented accidents occurred.

Event 1. The operator sits at the VT100 terminal and types the treatment parameters. She enters “X” for photon mode, the standard treatment for this patient.

Event 2. The operator notices an error. The patient requires electron mode, not photon mode. She uses the cursor keys to move back to the mode field and changes it from “X” to “E.”

Event 3. The software’s setup routine begins configuring the machine for electron mode. The turntable, which positions the tungsten target and flattening filter into or out of the beam path, begins rotating to the electron position (target and filter out of the beam path).

Event 4. The beam energy, however, was already set to 25 MeV for the photon mode that was originally entered. The software’s set routine checks the mode and energy. But the check and the turntable positioning run as separate concurrent tasks, and the timing of the operator’s edit creates a window where the mode variable has been updated to electron but the energy has not yet been adjusted downward.

Event 5. The operator presses the set button. The software reads the mode as electron. The turntable is in the electron position (no target, no filter). But the beam energy remains at 25 MeV, the photon-mode energy.

Event 6. The machine fires a 25 MeV electron beam directly into the patient with no tungsten target to convert it to X-rays and no flattening filter to spread it. The patient receives a massive overdose concentrated in a small area.

Event 7. The patient reports a sensation of intense heat or electric shock. The machine displays “MALFUNCTION 54” on the operator’s terminal. The operator, trained by experience that malfunctions are transient, presses P to proceed.

Event 8. In the Tyler accidents, the operator presses P and the machine fires again. A second overdose is delivered.

The patients at these facilities suffered severe radiation burns. Some died from the overdose within weeks. Others survived with permanent injuries.

The critical observation is the timing. The race condition is only triggered when the operator edits the mode field quickly enough that the cursor movement, field change, and set button press occur within the time window of the turntable rotation. An operator who types slowly never encounters it. An operator who never corrects a mistake never encounters it. The bug is triggered by a specific, skilled operator behavior: catching and correcting an error quickly. The system punishes competence.