MEM for Robots: Physical Intelligence Unveils 15-Minute Memory System for Gemma 3-4B VLAs

Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks

Researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have introduced Multi-Scale Embodied Memory (MEM) for robotic policies. This system enables Vision-Language-Action (VLA) models to process up to 15 minutes of context, overcoming the standard lack of memory in traditional end-to-end models.

Why This Matters

Current robotic policies typically operate on a single observation or a very short history, making long-horizon tasks like kitchen cleaning computationally intractable or prone to failure. By factorizing memory into short-term video and long-term language scales, MEM maintains a 380ms real-time inference threshold while allowing robots to adapt manipulation strategies based on recent failures.

Key Insights

62% success rate increase in refrigerator opening tasks with unknown hinge directions (MEM Research, 2026)
Space-Time Separable Attention concept to interleave spatial and causal-temporal attention, reducing complexity from O(n^2K^2) to O(n^2+nK)
Gemma 3-4B tool utilized by Physical Intelligence and Stanford researchers as the foundation for the π0.6 VLA backbone
Language-based long-term memory to compress 15 minutes of events into semantic summaries such as ‘I placed three bowls’
Single NVIDIA H100 GPU implementation capable of processing 16 observation frames while staying under the 380ms real-time barrier

Practical Applications

Use Case: π0.6 VLA performing ‘Recipe Setup’ by retrieving ingredients from multiple locations over 15 minutes. Pitfall: Memory-less VLAs failing tasks significantly more often due to short-term history constraints.
Use Case: MEM-based robot adapting manipulation strategies in real-time to pick up chopsticks at variable heights. Pitfall: Single-observation models failing to resolve self-occlusions or adapt grasps during the execution phase.

References:

https://www.marktechpost.com/2026/03/03/physical-intelligence-team-unveils-mem-for-robots-a-multi-scale-memory-system-giving-gemma-3-4b-vlas-15-minute-context-for-complex-tasks/

On This Page

Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

OpenMind OM1: Building an Open Source Operating System for Humanoid Robots

Allen Institute for AI (AI2) Introduces Olmo 3: Open Source 7B/32B LLMs with 65K Context Window

Google Introduces T5Gemma 2: Encoder Decoder Models with Multimodal Inputs via SigLIP and 128K Context