photo 19.jpg
Felix Yanwei Wang - PhD student in Electrical Engineering and Computer Science (EECS)MIT. Source: MIT News

Imagine a robot is doing the dishes for you. You ask it to get a bowl of soap from the sink, but its gripper doesn't quite reach the right spot.

With a new framework developed by researchers at MIT and NVIDIA, you can control a robot's behavior with simple gestures. You can point at a bowl, draw a path on the screen, or simply nudge the robot's arm in the right direction.

Unlike other approaches to modifying robot behavior, this technique does not require the user to collect new data and retrain the machine learning model controlling the robot. Instead, it allows the robot to use real-time, visual human feedback to select the action sequence that best matches the user's intent.

When researchers tested this framework, its success rate was 21% higher than an alternative method that did not utilize human intervention.

In the future, this framework could make it easy for a user to instruct a factory-trained robot to perform various household tasks, even if the robot has never seen the environment or the objects in that home before.

“We can’t expect ordinary users to collect data and fine-tune a neural network model themselves. They expect the robot to work right out of the box, and if something goes wrong, they need an intuitive mechanism to correct it. This is the challenge we tackled in this paper,” says Felix Yanwei Wang, a PhD student in the Department of Electrical Engineering and Computer Science (EECS) at MIT and the study’s lead author.

Minimize deviation

Recently, researchers have used pre-trained generative AI models to learn a “policy”—a set of rules that a robot follows to complete a task. These models can tackle a variety of complex tasks.

During training, the model is exposed only to valid robot movements, so it learns to generate proper movement trajectories.

However, this does not mean that every action a robot takes will match the user's actual expectations. For example, a robot might be trained to pick up boxes from a shelf without knocking them over, but might fail to reach a box on someone's bookshelf if the bookshelf layout is different from what it saw during training.

To fix such errors, engineers often collect additional data on new tasks and retrain the model, a costly and time-consuming process that requires machine learning expertise.

Instead, the MIT team wants to allow users to adjust the robot's behavior as soon as it makes a mistake.

However, if a human intervenes in the robot's decision-making process, it may accidentally cause the generative model to choose an invalid action. The robot may get the box the human wants, but may knock over books on the shelf in the process.

“We want users to interact with the robot without making such errors, thereby achieving behavior that better matches the user's intentions, while still ensuring validity and feasibility,” said Felix Yanwei Wang.

Enhance decision-making ability

To ensure that these interactions don't cause the robot to take invalid actions, the team uses a special sampling process. This technique helps the model choose the action from a set of valid choices that best matches the user's goal.

“Instead of imposing the user's intentions, we help the robot understand their intentions, while letting the sampling process fluctuate around the behaviors it has learned,” said Felix Yanwei Wang.

Thanks to this approach, their research framework outperformed other methods in simulation experiments as well as testing with a real robotic arm in a model kitchen.

While this method doesn't always complete the task immediately, it has a big advantage for the user: they can correct the robot as soon as they spot an error, rather than waiting for the robot to complete the task before giving new instructions.

Additionally, after the user gently nudges the robot a few times to guide it to pick up the correct bowl, the robot can remember that correction and incorporate it into future learning, so the next day the robot can pick up the correct bowl without needing to be instructed again.

“But the key to this continuous improvement is to have a mechanism for users to interact with the robot, and that is what we demonstrated in this research,” said Felix Yanwei Wang.

In the future, the team wants to speed up the sampling process while maintaining or improving performance. They also want to test the method in new environments to assess the robot's adaptability.

(Source: MIT News)