AutoAgent: Automating AI Agent Optimization and Harness Engineering

Meet ‘AutoAgent’: The Open-Source Library That Lets an AI Engineer and Optimize Its Own Agent Harness Overnight

Developed by Kevin Gu at thirdlayer.inc, AutoAgent is an open-source library designed to automate the manual iteration of agent system prompts and tools. In a single 24-hour run, the system achieved a #1 ranking on SpreadsheetBench with a score of 96.5%.

Why This Matters

Traditional agent engineering relies on a tedious manual prompt-tuning loop where humans tweak system prompts and tool definitions based on benchmark failure traces. AutoAgent shifts this paradigm by treating the agent harness—including orchestration and routing logic—as an optimization surface for a meta-agent, effectively hill-climbing on benchmark scores to outperform human-crafted configurations. This approach addresses the scalability limits of manual engineering by automating the diagnosis and remediation of agent failures.

Key Insights

AutoAgent achieved a 55.1% score on TerminalBench, the highest recorded for GPT-5, by autonomously iterating on agent configurations (2026).
The system utilizes a ‘ratchet loop’ inspired by Andrej Karpathy’s autoresearch, applying propose-train-evaluate cycles to agent scaffolding rather than model weights.
A ‘meta-agent’ manages a single agent.py file, rewriting tool definitions and routing logic based on performance data recorded in a results.tsv experiment log.
The library integrates with the Harbor format, using Docker containers and LLM-as-judge verifiers to provide consistent scoring for complex, non-deterministic tasks.
Experiments suggest a ‘model empathy’ effect where a Claude-based meta-agent optimizes Claude-based sub-agents more effectively than those based on GPT.

Practical Applications

Spreadsheet Automation: AutoAgent optimized an agent to reach 96.5% accuracy on SpreadsheetBench; a common pitfall is manual prompt-tuning which fails to capture edge cases handled by autonomous iteration.
Terminal Task Execution: Using the Harbor adapter, AutoAgent reached a 55.1% score on TerminalBench; the anti-pattern of hard-coding tool routing often leads to brittle agents that fail on complex CLI environments.

References:

https://www.marktechpost.com/2026/04/05/meet-autoagent-the-open-source-library-that-lets-an-ai-engineer-and-optimize-its-own-agent-harness-overnight/

On This Page

Meet ‘AutoAgent’: The Open-Source Library That Lets an AI Engineer and Optimize Its Own Agent Harness Overnight

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Microsoft Releases Agent Lightning: A Reinforcement Learning Framework for Optimizing AI Agents

OpenPlanter: A Recursive Open-Source AI Agent for Micro Surveillance and Data Investigation

Composio Open Sources Agent Orchestrator for Scalable Multi-Agent Workflows