AI Agents are considered the next breakthrough in AI, promising to completely change the way people interact with the Internet.
In recent days, China's AI Agent Manus has caused a stir in the world's technology community. According to the developer, the tool can perform complex tasks such as scanning candidate profiles, planning travel itineraries, and analyzing stocks when users give it basic instructions.
Before Manus launched AI Agent, an American AI giant, OpenAI, introduced Operator to ChatGPT Pro users in the US. According to OpenAI, this AI agent can perform simple tasks on behalf of its owner in a browser, such as booking concert tickets or placing online orders.
Operator is based on a new model called Computer-Using Agent (CUA), which is built on the large multimodal language model GPT-4o. OpenAI researcher Yash Kumar admits it is still in its early stages and still has shortcomings.
Like other AI agents, Operator takes a screenshot of the computer screen, scanning the pixels to determine what actions to take. CUA, the model behind it, is trained to interact with graphical interfaces such as buttons, menus, and text dialog boxes that are familiar to humans.
According to Reiichiro Nakano, another OpenAI scientist, traditional models use software through specialized APIs (application programming interfaces), which leads to many limitations.
CUA also breaks down tasks into smaller steps and tries to complete them one at a time, as well as reverting back to the beginning if something goes wrong. Currently, Operator can only do some things in its own browser.
OpenAI plans to extend CUA's capabilities in the future through an API (application programming interface) that allows developers to write their own applications based on it.
OpenAI also tested the safety of CUA, using a Red Team to determine what would happen if a user asked the AI Agent to do unacceptable tasks (such as producing biological weapons).
New York Times journalist Kevin Roose asked Operator to do a number of things for him, including ordering ice cream scoops through Amazon, buying a new domain name and reconfiguring it, booking a restaurant for February 14, and scheduling a haircut.
The writer commented that the AI Agent did most of the work by itself, but occasionally he had to “rescue” it after some unsuccessful attempts.
Roose describes the Operator as looking similar to regular ChatGPT, except that when given a task, the AI agent will open a mini browser window, type Amazon.com into the address bar, and start clicking to take action.
During the process, it will ask a few questions to clarify the owner's intentions such as delivery time... After making sure the correct choice is made, it sends a final confirmation, puts the item in the cart and proceeds to place the order.
The most important point here is that the user does not have to monitor it as it works in the background.
However, the Operator also failed some other missions because it was blocked on some websites like Reddit, YouTube or failed the CAPTCHA test.
Currently, there is no “standard” definition of AI Agent, but according to Rudina Seseri, founder and manager of venture capital firm Glasswing, AI Agent is an intelligent software system, designed to understand the operating environment, reason, make decisions and act to achieve goals automatically.
AI Agent uses many AI/ML techniques to do that, such as natural language processing, machine learning, computer vision.
Aaron Levie, founder and CEO of Box, points out that over time, as AI becomes more capable, AI Agents will be able to do more work for humans.
Jared Spataro, Director of AI at Work Marketing at Microsoft, sees AI Agents as “new applications in an AI-driven world.” They add new features to address each individual’s “biggest pain points” in the workplace to drive real business outcomes.
AI Agents take the power of generative AI further by not only assisting humans, but also working with them or on their behalf. According to IBM, AI agents act on the information they receive.
Since it doesn't have a comprehensive knowledge base to handle every task, it will use available tools, including external datasets, web searches, APIs, or even other AI Agents.
After gathering the missing information, the agent will upgrade its knowledge. That means at each step, it will re-evaluate its action plan and adjust itself.
It’s too early to say whether AI agents pose a threat to humans. But it’s not hard to imagine a near future where much of the web will be filled with robots talking to each other, shopping, and writing emails on behalf of their owners.
A “drone-free Internet” is slowly becoming a reality, so “click while you can,” concludes New York Times columnist Roose.
Source: https://vietnamnet.vn/ai-agent-va-cuoc-cach-mang-internet-khong-nguoi-lai-2379590.html
Comment (0)