“Artificial intelligence and robotics could catapult both fields to new heights.”

All hail Nature and all hail writer Liz Gibney for “The AI Revolution Is Coming to Robots: How Will It Change Them?”

AI & Robotics: Convergence Has Finally Arrived!!

All hail Garry Mathiason for calling that convergence way back in 2012 (pre-Amazon launching our robotics revolution with its purchase of KIVA Systems). We waited a decade and then launched Robo AI News in October of 2023, with the expectation that Garry’s prescient foretelling was about to land. Well, it has!

Gibney furthers in her Nature piece: “Artificial intelligence and robotics could catapult both fields to new heights.”

At Robo AI News, we’ll be following it all as it develops. Hang with us at the globe’s best robotics news threesome: Asian Robotics Review, This Is Robotics (podcast), and Robo AI News, for the Stories Behind the Technology.

“There’s a lot of momentum in using AI to improve robots — and using robots to improve AI…hooking up AI brains to physical robots will improve the foundation models, for example, giving them better spatial reasoning.

“True intelligence, some say, can only emerge when an agent can interact with its world”.  It might well be what takes AI beyond learning patterns and making predictions, to truly understanding and reasoning about the world.”

“The AI Revolution Is Coming to Robots: How Will It Change Them?


"To fully understand the basics of movements and their consequences, robots still need to learn from lots of physical data. And therein lies a problem."

by Liz Gibney

“For a generation of scientists raised watching Star Wars, there’s a disappointing lack of C-3PO-like droids wandering around our cities and homes. Where are the humanoid robots fueled with common sense that can help around the house and workplace?

Rapid advances in artificial intelligence (AI) might be set to fill that hole. “I wouldn’t be surprised if we are the last generation for which those sci-fi scenes are not a reality,” says Alexander Khazatsky, a machine-learning and robotics researcher at Stanford University in California.

From OpenAI to Google DeepMind, almost every big technology firm with AI expertise is now working on bringing the versatile learning algorithms that power chatbots, known as foundation models, to robotics. The idea is to imbue robots with common-sense knowledge, letting them tackle a wide range of tasks. Many researchers think that robots could become really good, really fast. “We believe we are at the point of a step change in robotics,” says Gerard Andrews, a marketing manager focused on robotics at technology company Nvidia in Santa Clara, California, which in March launched a general-purpose AI model designed for humanoid robots.

At the same time, robots could help to improve AI. Many researchers hope that bringing an embodied experience to AI training could take them closer to the dream of ‘artificial general intelligence’ — AI that has human-like cognitive abilities across any task. “The last step to true intelligence has to be physical intelligence,” says Akshara Rai, an AI researcher at Meta in Menlo Park, California.

But although many researchers are excited about the latest injection of AI into robotics, they also caution that some of the more impressive demonstrations are just that — demonstrations, often by companies that are eager to generate buzz. It can be a long road from demonstration to deployment, says Rodney Brooks, a roboticist at the Massachusetts Institute of Technology in Cambridge, whose company iRobot invented the Roomba autonomous vacuum cleaner.

There are plenty of hurdles on this road, including scraping together enough of the right data for robots to learn from, dealing with temperamental hardware and tackling concerns about safety. Foundation models for robotics “should be explored”, says Harold Soh, a specialist in human–robot interactions at the National University of Singapore. But he is skeptical, he says, that this strategy will lead to the revolution in robotics that some researchers predict.

Firm foundations
The term robot covers a wide range of automated devices, from the robotic arms widely used in manufacturing, to self-driving cars and drones used in warfare and rescue missions. Most incorporate some sort of AI — to recognize objects, for example. But they are also programmed to carry out specific tasks, work in particular environments or rely on some level of human supervision, says Joyce Sidopoulos, co-founder of MassRobotics, an innovation hub for robotics companies in Boston, Massachusetts. Even Atlas — a robot made by Boston Dynamics, a robotics company in Waltham, Massachusetts, which famously showed off its parkour skills in 2018 — works by carefully mapping its environment and choosing the best actions to execute from a library of built-in templates.

For most AI researchers branching into robotics, the goal is to create something much more autonomous and adaptable across a wider range of circumstances. This might start with robot arms that can ‘pick and place’ any factory product, but evolve into humanoid robots that provide company and support for older people, for example. “There are so many applications,” says Sidopoulos.

The human form is complicated and not always optimized for specific physical tasks, but it has the huge benefit of being perfectly suited to the world that people have built. A human-shaped robot would be able to physically interact with the world in much the same way that a person does.

However, controlling any robot — let alone a human-shaped one — is incredibly hard. Apparently simple tasks, such as opening a door, are actually hugely complex, requiring a robot to understand how different door mechanisms work, how much force to apply to a handle and how to maintain balance while doing so. The real world is extremely varied and constantly changing.

The approach now gathering steam is to control a robot using the same type of AI foundation models that power image generators and chatbots such as ChatGPT. These models use brain-inspired neural networks to learn from huge swathes of generic data. They build associations between elements of their training data and, when asked for an output, tap these connections to generate appropriate words or images, often with uncannily good results.

Likewise, a robot foundation model is trained on text and images from the Internet, providing it with information about the nature of various objects and their contexts. It also learns from examples of robotic operations. It can be trained, for example, on videos of robot trial and error, or videos of robots that are being remotely operated by humans, alongside the instructions that pair with those actions. A trained robot foundation model can then observe a scenario and use its learnt associations to predict what action will lead to the best outcome.

Google DeepMind has built one of the most advanced robotic foundation models, known as Robotic Transformer 2 (RT-2), that can operate a mobile robot arm built by its sister company Everyday Robots in Mountain View, California. Like other robotic foundation models, it was trained on both the Internet and videos of robotic operation. Thanks to the online training, RT-2 can follow instructions even when those commands go beyond what the robot has seen another robot do before1. For example, it can move a drink can onto a picture of Taylor Swift when asked to do so — even though Swift’s image was not in any of the 130,000 demonstrations that RT-2 had been trained on.

In other words, knowledge gleaned from Internet trawling (such as what the singer Taylor Swift looks like) is being carried over into the robot’s actions. “A lot of Internet concepts just transfer,” says Keerthana Gopalakrishnan, an AI and robotics researcher at Google DeepMind in San Francisco, California. This radically reduces the amount of physical data that a robot needs to have absorbed to cope in different situations, she says.

But to fully understand the basics of movements and their consequences, robots still need to learn from lots of physical data. And therein lies a problem.

Data dearth
Although chatbots are being trained on billions of words from the Internet, there is no equivalently large data set for robotic activity. This lack of data has left robotics “in the dust”, says Khazatsky.