On March 5, the Association for Computing Machinery announced the Turing Award to scientists Andrew Barto and Richard Sutton for their research on "reinforcement learning" that allows AI to learn from "joy" and "pain" in digital form.
The Turing Award, dubbed the "Nobel Prize of computing" since its inception in 1966, comes with a $1 million prize that the two scientists will share.
The journey of “reinforcement learning” began in 1977, when Andrew Barto, then a researcher at the University of Massachusetts, Amherst, proposed a new theory: neurons in the human brain act as “hedonists,” always seeking to maximize pleasure and minimize pain.
In 1978, Richard Sutton joined Andrew Barto to develop this idea to explain human intelligence and apply it to artificial intelligence (AI). The result was the birth of "reinforcement learning" - a method that allows AI systems to learn from "joy" and "pain" in digital form.
Their work has laid the groundwork for major breakthroughs over the past decade, from Google's AlphaGo system defeating world-class Go player Lee Sedol in 2016, to OpenAI's ChatGPT chatbot, which is surprisingly human-like in its conversational abilities.
“They are the undisputed pioneers in reinforcement learning,” says Oren Etzioni, a professor emeritus at the University of Washington and founder of the Allen Institute for Artificial Intelligence, whose 1998 book “Introduction to Reinforcement Learning” remains the standard text in the field.
Psychologists have long studied how humans and animals learn from their experiences. In the 1940s, pioneering British computer scientist Alan Turing proposed that machines could learn in a similar way.
But it was Dr. Barto and Dr. Sutton who began exploring the mathematics of how this might work, building on a theory proposed by A. Harry Klopf, a computer scientist working for the government. Dr. Barto then built a lab at UMass Amherst dedicated to the idea, while Dr. Sutton set up a similar lab at the University of Alberta in Canada.
“Reinforcement learning” isn’t just for games. Using the “reinforcement learning from human feedback” (RLHF) technique, ChatGPT has been trained by hundreds of users to improve its answering ability.
Recently, companies like OpenAI and DeepSeek have also developed self-learning systems that allow chatbots to solve problems on their own and simulate human reasoning, leading to the emergence of "reasoning" systems like OpenAI's o1 or DeepSeek's R1.
Looking ahead, both scientists believe that “reinforcement learning” will help robots learn from experience, just as humans and animals do. “It’s very natural to control an organism through reinforcement learning ,” Barto said.
With their revolutionary contributions, Andrew Barto and Richard Sutton not only deserve the Turing Award but also open the door to a new era of artificial intelligence.
Source: https://vietnamnet.vn/giai-nobel-cua-nganh-dien-toan-2025-da-co-chu-2377820.html
Comment (0)