photo 19.jpg
Felix Yanwei Wang - PhD student in Electrical Engineering and Computer Science (EECS) MIT. Source: MIT News

Imagine a robot is helping you wash the dishes. You ask it to grab a bowl of soap from the sink, but its gripper doesn't grab exactly where it needs to.

With a new framework developed by researchers at MIT and NVIDIA, you can control a robot’s behavior with simple gestures. You can point at a bowl or draw a path on the screen, or simply nudge the robot’s arm in the right direction.

Unlike other approaches to modifying robot behavior, this technique does not require the user to collect new data and retrain the machine learning model that controls the robot. Instead, it allows the robot to use real-time, visual human feedback to select the action sequence that best matches the user’s intent.

When researchers tested this framework, its success rate was 21% higher than an alternative approach that did not utilize human intervention.

In the future, this framework could make it easy for a user to instruct a factory-trained robot to perform various household tasks, even if the robot has never seen the environment or the objects in that home before.

“We can’t expect ordinary users to collect data and fine-tune a neural network model. They expect the robot to work right out of the box, and if something goes wrong, they need an intuitive mechanism to correct it. This is the challenge we tackled in this paper,” says Felix Yanwei Wang, a graduate student in the Electrical Engineering and Computer Science (EECS) department at MIT and the study’s lead author.

Minimize deviation

Recently, researchers have used pre-trained generative AI models to learn a “policy”—a set of rules that a robot follows to complete a task. These models can solve many complex tasks.

During training, the model is exposed only to valid robot movements, so it learns to generate appropriate movement trajectories.

However, this does not mean that every action a robot takes will match the user's actual expectations. For example, a robot may be trained to pick up boxes from a shelf without knocking them over, but may fail to reach a box on someone's bookshelf if the bookshelf layout is different from what it saw during training.

To fix such errors, engineers often collect additional data on new tasks and retrain the model, a costly and time-consuming process that requires machine learning expertise.

Instead, the MIT team wants to allow users to adjust the robot's behavior as soon as it makes a mistake.

However, if a human interferes with the robot's decision-making process, it may accidentally cause the generative model to choose an invalid action. The robot may get the box the human wants, but may knock over books on the shelf in the process.

“We want users to interact with the robot without making such errors, thereby achieving behavior that better matches the user's intentions, while still ensuring validity and feasibility,” said Felix Yanwei Wang.

Enhance decision making ability

To ensure that these interactions don't cause the robot to take invalid actions, the team uses a special sampling procedure. This technique helps the model choose the action from a set of valid choices that best matches the user's goals.

“Instead of imposing the user's intentions, we help the robot understand their intentions, while letting the sampling process fluctuate around the behaviors it has learned,” said Felix Yanwei Wang.

Thanks to this approach, their research framework outperformed other methods in simulation experiments as well as testing with a real robotic arm in a model kitchen.

While this method doesn't always complete the task immediately, it has a big advantage for the user: they can correct the robot as soon as they detect an error, instead of waiting for the robot to complete the task and then giving new instructions.

Additionally, after the user gently nudges the robot a few times to guide it to pick up the correct bowl, the robot can remember that correction and incorporate it into future learning, so the next day the robot can pick up the correct bowl without needing to be guided again.

“But the key to this continuous improvement is to have a mechanism for users to interact with the robot, and that is exactly what we demonstrated in this study,” said Felix Yanwei Wang.

In the future, the team wants to speed up the sampling process while maintaining or improving performance. They also want to test the method in new environments to assess the robot's adaptability.

(Source: MIT News)