OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows
These articles are AI-generated summaries. Please check the original sources for full details.
OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows
OpenAI has released GPT-5.1-Codex-Max, a model optimized for long-running software engineering tasks. It autonomously handles multi-hour workflows by compaction, sustaining sessions over millions of tokens.
Why This Matters
Traditional models struggle with context window limits, forcing developers to split tasks or accept reduced accuracy. GPT-5.1-Codex-Max overcomes this via compaction, pruning redundant history while retaining critical state. This enables uninterrupted coding sessions but risks increased complexity in debugging if compaction obscures intermediate steps.
Key Insights
- “24-hour autonomous coding sessions, 2025”: OpenAI’s internal evaluations show the model operating independently on single tasks for over 24 hours.
- “Compaction over fixed context windows for long-horizon tasks”: The model natively compresses interaction history to span multiple context windows.
- “Codex CLI used by developers for PR creation and code review”: GPT-5.1-Codex-Max is deployed in Codex’s CLI, IDE extensions, and code review tools.
Practical Applications
- Use Case: Frontend coding in Codex CLI with compaction for multi-hour tasks.
- Pitfall: Over-reliance on compaction may obscure debugging steps in complex workflows.
References:
Continue reading
Next article
vLLM vs TensorRT-LLM vs HF TGI vs LMDeploy, A Deep Technical Comparison for Production LLM Inference
Related Content
OpenAI Launches Codex Chrome Extension for Signed-In Browser Workflows
OpenAI releases a Codex Chrome extension enabling AI agents to access authenticated sessions for LinkedIn and Salesforce via a new three-tier browser execution model.
Lux Surpasses Google Gemini CUA with 83.6% Accuracy on Online Mind2Web Benchmark
Lux, a new foundation computer use model by OpenAGI, achieves 83.6% accuracy on Online Mind2Web, outperforming Google Gemini CUA and others.
A Coding Guide to Design and Orchestrate Advanced ReAct-Based Multi-Agent Workflows with AgentScope and OpenAI
This tutorial demonstrates building a multi-agent incident response system using AgentScope, achieving complex workflows in pure Python.