Skip to main content

On This Page

Alibaba Tongyi Lab Releases MAI-UI: A Foundation GUI Agent Family that Surpasses Gemini 2.5 Pro, Seed1.8 and UI-Tars-2 on AndroidWorld

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Alibaba Tongyi Lab Releases MAI-UI: A Foundation GUI Agent Family that Surpasses Gemini 2.5 Pro, Seed1.8 and UI-Tars-2 on AndroidWorld

Alibaba Tongyi Lab has introduced MAI-UI, a family of foundation GUI agents built on the Qwen3 VL model, ranging in size from 2B to 235B parameters. This system achieves state-of-the-art results in GUI grounding and mobile navigation, surpassing existing models like Gemini 2.5 Pro, Seed1.8, and UI-Tars-2 on the AndroidWorld benchmark.

Why This Matters

Current GUI agents often struggle with real-world complexity, lacking native user interaction, tool integration, and privacy considerations. Ideal models assume perfect data and consistent environments, but practical applications require handling ambiguous instructions, dynamic app interfaces, and sensitive user data – failures in these areas can lead to unusable applications and significant development rework.

Key Insights

  • 76.7% success on AndroidWorld: MAI-UI’s largest variant achieved this score, exceeding competitors.
  • Self-Evolving Data Pipeline: Improves navigation robustness by perturbing task parameters and filtering low-quality trajectories.
  • Device-Cloud Collaboration: Enables privacy-sensitive operations to remain on-device while leveraging cloud-based models for complex tasks.

Working Example

# Example of a simplified action output from MAI-UI
action = {
    "type": "click",
    "element_id": "com.example.app:id/submit_button",
    "coordinates": (540, 1800)
}

# Illustrative code for executing the action (simplified)
def execute_action(action):
    if action["type"] == "click":
        # Simulate clicking the element
        print(f"Clicking element with ID: {action['element_id']} at {action['coordinates']}")
    elif action["type"] == "text_input":
        # Simulate entering text
        print(f"Entering text: {action['text']} into element: {action['element_id']}")

Practical Applications

  • Automated Customer Support: A mobile app using MAI-UI could automatically resolve customer issues by navigating the app interface and performing actions on behalf of the user.
  • Pitfall: Relying solely on static datasets for training can lead to brittle agents that fail when app interfaces change or new app versions are released.

References:

Continue reading

Next article

AWS Account Best Practices: Secure Your AWS Account Before It's Too Late

Related Content