Google has unveiled an AI model for controlling robots without an internet connection

Google DeepMind has unveiled a next-generation language-visual-action model, Gemini Robotics On-Device, capable of controlling robots without an internet connection. According to the developers, this is the first Vision-Language-Action (VLA) AI that combines perception, understanding of instructions and execution of actions in a single local process.
The new model extends the capabilities of the previous version of Gemini Robotics, released in March. It can control two-armed robots, perform complex manipulations and adapt to unfamiliar objects and environments without the need for remote access. Demonstrated scenarios include unpacking bags, folding clothes and assembling components on a production line.
According to Sergey Lonshakov, the architect of the Robonomics project, this approach is in line with the current trend in robotics to create seamless models in which planning and execution of tasks take place in real time. This eliminates pauses in task switching and increases the autonomy of the systems.
Gemini Robotics On-Device has been tested on ALOHA, Franka FR3 and Apollo humanoid from Apptronik. 50-100 demonstrations are sufficient to adapt to new tasks, and a special SDK with MuJoCo simulator support is available to customize the model. Developers can use natural language prompts for training and testing.
Interest in autonomous robotics is growing rapidly. In March, Nvidia unveiled a platform for modeling humanoid movements, and in June it became known that Amazon was testing its own AI to deliver packages using robots in Rivian electric vans.
DeepMind’s development takes a step towards creating more autonomous, versatile and adaptive robots that can operate in the real world without constant cloud support.