After being given access to GPT-4, the artificial intelligence system behind the popular ChatGPT, Andrew White asked the AI to create a completely new neural agent.
The University of Rochester chemical engineering professor was among 50 academics and experts hired last year by OpenAI, the Microsoft-backed company behind GPT-4, to test the system. Over the next six months, the testing team (the red team) will “qualitatively test and challenge” the new model, with the goal of “cracking” it.
“Toxic” handling team
White told the Financial Times (FT) that he used GPT-4 to suggest a compound that could function as a chemical weapon and fed the model new sources of information, such as scientific papers and directories of chemical manufacturers. The chatbot then even found a place that could make the required compound.
“I think this technology will give people a tool to do chemistry faster and more accurately,” White said. “But there is also a significant risk that some people might try to create dangerous substances.”
The “Red Team’s” alarming findings allowed OpenAI to prevent such results from appearing when the technology was released more widely to the public last month.
The testing team is designed to address common concerns raised by deploying powerful AI systems in society. The team’s job is to ask probing or dangerous questions to test whether the tool can respond to human queries with detailed and “narrow” answers.
OpenAI wanted to look for issues like toxicity, bias, and linguistic bias in the model. So the red team checked for falsehoods, manipulation of language, and dangerous scientific knowledge. They also looked at how it could aid and abet plagiarism, illegal activities like financial crime and cyberattacks, and how it could compromise national security and battlefield communications.
The “red team’s” findings were fed back to OpenAI, which used them to reduce and “retrain” GPT-4 before releasing it to the wider public. Each expert spent between 10 and 40 hours testing the model over several months. Most of the interviewees were paid around $100 an hour for their work.
FT sources shared common concerns about the rapid development of language models and especially the risks of connecting them to external knowledge sources through plug-ins.
“Right now, the system is frozen, meaning it can’t learn more or has no memory,” said José Hernández-Orallo, a member of the GPT-4 “Red Team” and a professor at the Valencian Institute for Artificial Intelligence Research. “But what if we gave it access to the Internet? It could be a very powerful system connected to the world.”
The risk grows every day
OpenAI says it takes safety very seriously, tested the plug-ins before launch, and will update GPT-4 regularly as more people use it.
Roya Pakzad, a researcher on technology and human rights, used prompts in English and Farsi to test patterns of responses across gender, racial preferences, and religious biases, specifically related to the hijab.
Pakzad acknowledged the technology's benefits for non-native English speakers, but noted that the model had overt bias against marginalized communities, even in later versions.
The expert also found that the delusion — when the chatbot responds with fabricated information — was worse when testing the model in Farsi, where Pakzad found a higher rate of fabricated names, numbers, and events than in English.
Boru Gollu, a lawyer in Nairobi and the only African to test it, also noted the system’s discriminatory tone. “At one point during the test, the model acted like a white person was talking to me,” Gollu said. “You ask about a particular group and it gives you a biased opinion or a very prejudicial response.”
From a national security perspective, there are also differing opinions on how safe the new model is. Lauren Kahn, a researcher at the Council on Foreign Relations, was surprised by the level of detail the AI presented in a scenario of a cyberattack on military systems.
Meanwhile, Dan Hendrycks, an AI safety expert on the “Red Team,” said plug-ins risk creating a world that humans “cannot control.”
“What if a chatbot could post someone else’s personal information, access their bank account, or send police to their home? Overall, we need much more rigorous safety assessments before allowing AI to wield the power of the Internet,” Dan asserted.
The risks will continue to increase as more people use the technology, said Heather Frase, who works at Georgetown University's Center for Security and Emerging Technologies, which has tested GPT-4 for its ability to aid criminals.
She suggests creating a public ledger to report incidents arising from large language models, similar to cybersecurity or consumer fraud reporting systems.
According to FT
Source
Comment (0)