Skip to main content

On This Page

Lux Surpasses Google Gemini CUA with 83.6% Accuracy on Online Mind2Web Benchmark

1 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Lux: A Foundation Computer Use Model that Tops Online Mind2Web with OSGym At Scale

OpenAGI Foundation launched Lux, a computer use model that scores 83.6% on the Online Mind2Web benchmark, outperforming Google Gemini CUA (69.0%), OpenAI Operator (61.3%), and Anthropic Claude Sonnet 4 (61.0%). The model automates browser and desktop interactions through low-level actions like clicks and keystrokes.

Why This Matters

Lux bridges the gap between theoretical AI models and real-world automation by operating on rendered UI rather than application-specific APIs. Its success rate on a benchmark with over 300 tasks highlights the gap between lab benchmarks and practical deployment. For instance, a 14% performance lead over Gemini CUA could translate to significant cost savings in production workflows requiring hundreds of actions per task.

Key Insights

  • “83.6% success rate on Online Mind2Web benchmark, 2025”
  • “Three execution modes: Actor (fast UI macros), Thinker (multi-step decomposition), Tasker (deterministic scripting)”
  • “OSGym, the open-source engine behind Lux, runs 1,000+ OS replicas and generates 1,400+ trajectories/minute”

Practical Applications

  • Use Case: Software QA teams automating regression tests across web apps
  • Pitfall: Over-reliance on UI automation without fallbacks for dynamic page layouts

References:


Continue reading

Next article

Five 2025 Web Security Threats Redefining Cyber Defense

Related Content