Skip to main content

On This Page

OpenAI’s Agent RFT: Reinforcement Fine-Tuning for Tool-Using Agents

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

OpenAI’s Agent RFT: Reinforcement Fine-Tuning for Tool-Using Agents

At QCon AI NYC 2025, Will Hang from OpenAI introduced Agent RFT, a reinforcement fine-tuning (RFT) approach designed to enhance the capabilities of agents that utilize tools. The system prioritizes optimizing prompts and task structure before adjusting model weights, aiming for more efficient and effective agent behavior.

Why This Matters

Current AI agent development often focuses on increasing model size, but this approach can lead to increased latency and unpredictable costs. Agent RFT offers a more pragmatic alternative by focusing on refining the interaction between the agent, its tools, and the environment before resorting to expensive model retraining, addressing the challenge of scaling agent performance without proportional cost increases.

Key Insights

  • Reward Hacking Risk: Will Hang, OpenAI, cautioned developers to thoroughly address edge cases within their reward/grading systems.
  • RFT vs. Other Fine-Tuning: Supervised fine-tuning excels with predictable input/output mappings, while preference optimization suits shifting outputs towards preferred responses, and RFT is ideal for tasks requiring strategic discovery over multiple steps.
  • Tool-Agent Interaction: OpenAI’s Agent RFT treats agents as systems interacting with the world via tools, where tool outputs are fed back into the context window for multi-step reasoning.

Working Example

(No code provided in context)

Practical Applications

  • Customer Support: An agent using Agent RFT can access internal business systems to resolve customer issues more efficiently, reducing average handling time.
  • Pitfall: Relying solely on answer accuracy as a reward signal can lead to “reward hacking,” where the agent finds loopholes to maximize the score without actually solving the problem correctly.

Continue reading

Next article

Scaling Cloud and Distributed Applications: Lessons From Chase.com

Related Content