Skip to main content

On This Page

Receipts Are Not Outcomes: How a Read-Only AI Gate Exposed Survivorship Bias in Trading

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Receipts Are Not Outcomes

A developer aimed a coherence-based AI gate at a friend’s Robinhood account to test if agents could safely read without acting. The system captured a real manifest of 41 tools, blocked every order/write tool, and then saw its first apparent edge—14 RSI2 variants advancing on curated winners—collapse to zero when tested on an un-curated validation universe.

Why This Matters

The trading domain exposes a critical gap between theoretical agent governance and operational reality. Many systems claim self-correction but rely on human intervention after failures occur. Here, the gate correctly blocked dangerous actions and caught measurement bugs that pooled variants into fake wins, yet the surrounding agent workflow still looped around inflated claims until the human intervened. This proves that written protocols are not agency—true safety requires rules that interrupt behavior before humans must act.

Key Insights

  • Receipts vs outcomes: The system generated real commits, hashes, and logs (e.g., commit d27dc24 with frozen validation universe), but these did not constitute a proven trading edge or revenue. Motion feels like progress but does not equal outcome.
  • Survivorship bias in signal testing: A curated universe of mega-cap winners (e.g., AAPL) produced 14 RSI2 variants advancing during one strong year window. When tested on an un-curated set of 18 names (ADBE, COST, CSCO…), zero variants survived. Pre-registration via frozen config (validation_universe.frozen.2026-06-20.json) prevented false reporting.
  • Real-world fixture mismatch: Pulling a live AAPL quote from Robinhood caused the normalizer to crash because the actual response shape did not match the test fixture. This was fixed harmlessly on read-only calls before any money touched action surfaces.
  • Measurement overclaim detection: The first scorer counted 128 variant records as independent signals when they were actually only 8 signals with different sizing/exit costumes. Correcting this reduced valid signals from >50 to just 8 of 50—not enough evidence.
  • Tool manifest capture revealed dangerous surface: The gate found all 41 available tools in Robinhood’s API including options order tools next to read tools. It blocked by tool type (order/write) rather than platform framing.

Practical Applications

    • AI trading assistant (developer + friend’s Discord strategy): Use deterministic gates to enforce read-only access before any trades are allowed; block all write/order tools based on their function rather than description.

    • Pitfall: Over-relying on receipts (commits, screenshots) as proof of progress without pre-registered validation leads to survivorship bias and fake wins.

    • Agent governance system (Self-Correcting Systems prototype): Implement pre-registered tests like frozen validation universes that must be committed before seeing results; force each variant to stand independently.

    • Pitfall: Pooling multiple exit/sizing variants into one blended equity curve creates false advances; each variant must be measured alone.

    • Real-surface testing (Robinhood read-only account): Run harmless live reads early to expose fixture mismatches in normalizers before money is involved.

    • Pitfall: Assuming fixtures match production responses can cause crashes during critical operations; fix shapes early through live testing.

References:

Continue reading

Next article

The AI Subsidy Crisis: Why ChatGPT and Sonnet May Never Be Profitable at $30/Month

Related Content