Gemini Robotics brings AI into the physical world
Google DeepMind has announced a significant step towards bringing advanced AI reasoning into the physical world with the introduction of Gemini Robotics. This initiative features two new AI models based on Gemini 2.0, designed specifically to enable a new generation of more capable and helpful robots.
The goal is to equip robots with "embodied reasoning" – the humanlike ability to comprehend, react to, and safely act within the physical environment.
Gemini Robotics: Direct Control with VLA
The first model, Gemini Robotics, is an advanced vision-language-action (VLA) model built upon Gemini 2.0. It adds physical actions as a direct output modality, allowing it to control robots. Key strengths include:
- Generality: Leverages Gemini's world understanding to generalize effectively to new tasks, objects, instructions, and environments, significantly outperforming previous SOTA VLA models on generalization benchmarks.
- Interactivity: Understands conversational language commands (including multiple languages) and adapts to changes in its environment or instructions on the fly ("steerability").
- Dexterity: Capable of performing complex, multi-step tasks requiring fine motor skills, such as origami or packing items.
- Adaptability: While trained mainly on the ALOHA 2 platform, it can be adapted to control various robot types, including Franka arms and Apptronik's Apollo humanoid.
Gemini Robotics-ER: Enhanced Spatial Reasoning
The second model, Gemini Robotics-ER (Embodied Reasoning), focuses on enhancing Gemini's spatial understanding necessary for robotics. Key features:
- Improves spatial reasoning capabilities like pointing and 3D object detection significantly over Gemini 2.0.
- Combines spatial reasoning with coding abilities to generate robot actions, such as intuiting appropriate grasp points and trajectories for objects.
- Can manage the full robotics control pipeline (perception, state estimation, planning, code generation) end-to-end, achieving 2-3x higher success rates than Gemini 2.0 alone.
- Supports in-context learning from human demonstrations.
Partnerships and Path Forward
Google DeepMind is collaborating with robotics company Apptronik to integrate Gemini 2.0 into next-generation humanoid robots. They are also working with trusted testers to refine Gemini Robotics-ER.
These models lay the foundation for more general-purpose, interactive, and dexterous robots, marking progress towards AI systems that can be genuinely helpful in the physical world.
Learn more on the Google DeepMind Blog.