Skip to main content
postmortem

The Mechanism

4 min read Chapter 15 of 38

The Mechanism

The SMARS codebase is a large Java application. The following reconstruction illustrates the architecture described in the SEC’s administrative proceedings.

// RECONSTRUCTED FROM SEC ADMINISTRATIVE PROCEEDINGS
// Knight Capital Group, File No. 3-15570

public class SmarsRouter {
    
    private final Map<String, Boolean> featureFlags;
    
    public void routeOrder(Order parentOrder) {
        // The flag "POWER_PEG_ENABLED" was originally created in 2003
        // for the Power Peg trading algorithm.
        // In 2012, the same flag name was reused to activate
        // the new Retail Liquidity Program (RLP) code.
        
        if (featureFlags.get("POWER_PEG_ENABLED")) {
            // On 7 of 8 servers: new code is deployed
            // This branch executes the RLP algorithm (new code)
            
            // On 1 of 8 servers: old code is deployed
            // FAILURE POINT: This branch executes the Power Peg
            // algorithm (2003 code), which buys at offer and 
            // sells at bid, losing money on every trade
            routeWithPowerPeg(parentOrder);
        } else {
            routeStandard(parentOrder);
        }
    }
    
    // 2003 Power Peg algorithm - should have been deleted years ago
    private void routeWithPowerPeg(Order parentOrder) {
        // Power Peg was designed for a specific, narrow use case
        // that no longer exists. Its behavior under normal market
        // making order flow is to trade backwards:
        // buy at the ask (high price), sell at the bid (low price)
        
        for (ChildOrder child : splitOrder(parentOrder)) {
            if (child.getSide() == Side.BUY) {
                child.setPrice(currentAsk(child.getSymbol()));
                // Buying at the ask: paying the highest available price
            } else {
                child.setPrice(currentBid(child.getSymbol()));
                // Selling at the bid: accepting the lowest available price
            }
            sendToExchange(child);
        }
    }
}

Three engineering failures converge:

Dead code. The Power Peg algorithm was discontinued in 2005 but never removed from the codebase. It compiled. It was present in every deployed binary. It was separated from activation by a single boolean flag. Dead code in a trading system is not technical debt. It is a loaded weapon with the safety off, pointed at the balance sheet.

Feature flag reuse. The Power Peg flag name was repurposed for RLP activation. This is equivalent to reusing a circuit breaker label: the wiring diagram says one thing, the physical wiring does another. The reuse meant that enabling the flag had two possible effects depending on which version of the code was deployed. No test verified that the flag activated the intended code path on every server.

No deployment verification. The deployment was manual and unverified. No automated check confirmed that all eight servers were running the same binary version. No canary deployment put the new code on one server first to verify its behavior before rolling out to the remaining seven. The checklist was a document, not an automated process.

The absence of a kill switch compounded the damage. When the trading desk detected abnormal positions, they had no single mechanism to halt all SMARS order routing. The system had been built for throughput, not for emergency shutdown. Stopping the bleeding required identifying which server was misbehaving, accessing that server, and stopping the process. In the time this took, approximately 45 minutes, losses accumulated at a rate of nearly $10 million per minute.

The 45-minute duration is not primarily a diagnostic delay. Knight engineers suspected a software problem within minutes. The delay was operational: there was no single action that could stop all order routing. Each server had to be addressed individually. The team had to determine which server was the source of the abnormal orders, and this required correlating order flow data with server identifiers in a system not designed for rapid forensic analysis during a live incident.

The market impact extended beyond Knight. The abnormal order flow in 154 stocks caused price dislocations that affected other market participants. Some stocks moved 10% or more in minutes. The NYSE later canceled trades in six stocks where prices moved more than 30% from the pre-market price. The remaining trades stood, and Knight absorbed the losses.