GRASP: Robust Gradient-Based Planning for Long-Horizon World Models
These articles are AI-generated summaries. Please check the original sources for full details.
Gradient-based Planning for World Models at Longer Horizons
Researchers have introduced GRASP, a new gradient-based planner designed to overcome the fragility of long-horizon control in learned world models. At a planning horizon of 60, GRASP achieves a 26.2% success rate while standard Cross-Entropy Method (CEM) performance drops to 7.2%.
Why This Matters
While modern world models act as powerful general-purpose simulators, long-horizon planning is technically fragile due to ill-conditioned computation graphs where Jacobian conditioning scales exponentially with time. Additionally, high-dimensional latent spaces introduce adversarial robustness issues where state-input gradients become brittle, causing optimization to fail in unseen directions orthogonal to the data manifold.
Key Insights
- Backprop through time (BPTT) in world models leads to exploding or vanishing gradients as Jacobian conditioning scales exponentially with the horizon T.
- Adversarial robustness issues, identified by Szegedy et al. (2014) and Goodfellow et al. (2015), cause models to have high Lipschitz constants in directions normal to the data manifold.
- Lifting dynamics into virtual states via collocation allows optimization to occur in parallel across time, providing a speed-up compared to serial rollout objectives.
- Injecting Gaussian noise into virtual state iterates, rather than action iterates, enables effective exploration between basins in lifted optimization spaces.
- Reshaping gradients by stopping brittle state-input signals while maintaining action-input signals prevents the optimizer from ‘hacking’ the model via adversarial state examples.
Practical Applications
- Long-horizon robotic manipulation: GRASP maintains a 10.4% success rate at horizon H=80 in Push-T tasks where competing methods like LatCo fail entirely.
- High-speed trajectory optimization: Lifted state optimization allows for a median time to success of 15.2s at H=50, nearly 6x faster than CEM’s 96.2s.
- Pitfall: Using standard state-input gradients in deep world models often results in ‘sticky’ optimization where the model tricks itself into feasible but unphysical dynamics.
References:
Continue reading
Next article
Automating Dead Endpoint Detection: Deleting 16,000 Lines of Legacy Node.js Code
Related Content
Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
OAT achieves a 52.3% aggregate success rate, outperforming diffusion-based baselines and other tokenization schemes in robotics.
MBZUAI Researchers Introduce PAN: A General World Model For Interactable Long Horizon Simulation
MBZUAI’s PAN world model achieves 70.3% agent simulation accuracy, enabling interactive long-horizon video generation.
Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework
FOFPred, a new framework from Salesforce AI, achieves state-of-the-art results on robot manipulation benchmarks, reaching a 78.7% Task 5 success rate on CALVIN.