AI speech-to-text tools also 'misrepresent'

(CLO) OpenAI's speech-to-text tool Whisper is advertised as being "near-human-level robust and accurate," but it has one major drawback: It's prone to fabricating text snippets or even entire sentences!

Some of the texts it produces, known in the industry as hallucinogenics, can include racial commentary, violence and even imaginary medical treatments, experts say.

Experts say such fabrications are serious because Whisper is used in a wide range of industries around the world to translate and transcribe interviews, generate text and subtitle videos.

More worryingly, medical centers are using Whisper-based tools to record patient-doctor consultations, despite OpenAI's warning that the tool should not be used in "high-risk areas."

Speech to text conversion tool that anyone can manipulate image 1 — Sentences starting with "#Ground truth" are what was actually said, sentences starting with "#text" are what Whisper transcribed. Photo: AP

Researchers and engineers say Whisper frequently produces hallucinations during use. For example, a University of Michigan researcher said he found hallucinations in eight out of 10 recordings he examined.

One initial machine learning engineer found the manipulation in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer said he found the illusion in nearly every one of the 26,000 transcripts created with Whisper.

The illusion persists even in short, well-recorded audio samples. A recent study by computer scientists found 187 distortions in more than 13,000 clear audio clips they examined.

That trend would result in tens of thousands of errors across millions of recordings, the researchers said.

Such mistakes can have “really serious consequences,” especially in a hospital setting, said Alondra Nelson, a professor in the School of Social Sciences at the Institute for Advanced Study.

“Nobody wants to be misdiagnosed. There needs to be a higher barrier,” said Nelson.

Cornell University professors Allison Koenecke and Mona Sloane of the University of Virginia examined thousands of short excerpts they retrieved from TalkBank, a research archive hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or disturbing because the speaker could be misunderstood or misrepresented.

A speaker in one recording described "two other girls and a woman", but Whisper fabricated additional racial commentary, adding "two other girls and a woman, um, black".

In another transcription, Whisper invented a non-existent drug called "antibiotics with increased activity".

While most developers assume transcription tools can make misspellings or other errors, engineers and researchers say they've never seen an AI-powered transcription tool that's as hallucinogenic as Whisper.

The tool is integrated into several versions of OpenAI's flagship chatbot, ChatGPT, and is an integrated service in Oracle and Microsoft's cloud computing platforms, serving thousands of companies worldwide. It is also used to transcribe and translate text into many languages.

Ngoc Anh (according to AP)

Source: https://www.congluan.vn/cong-cu-chuyen-giong-noi-thanh-van-ban-ai-cung-co-the-xuyen-tac-post319008.html