(Bloomberg) — ElevenLabs has suspended the deep audio creator of U.S. President Joe Biden urging people not to vote in this week’s New Hampshire primary, according to a person familiar with the matter. someone familiar with the subject. deepfake audio, according to Pindrop Security Inc., a voice fraud detection company that analyzed it.
Read more from Bloomberg
ElevenLabs was notified this week of Pindrop’s findings and is investigating, the person said. When the deepfake was traced to its creator, that user’s account was suspended, the person said, asking not to be identified because the information is not public.ElevenLabs, a startup that uses artificial intelligence software to replicate voices in more than two dozen languages , declined to comment. Earlier this week, ElevenLabs announced an $80 million funding round from investors including Andreessen Horowitz and Sequoia Capital. CEO Mati Staniszewski said the latest funding gives his startup a $1.1 billion valuation.
In an interview last week, Staniszewski said the audio imitating voices would be removed without permission. On its website, the company says it allows voice clones of public figures, such as politicians, if “the clips show humor or mockery in a way that makes it obvious to the listener that what they are hearing is a parody.” Biden convinced people to save their votes for the US elections in November to the dismay of disinformation experts and election officials alike. Not only did it show the relative ease of creating deep fear, but it also suggested that bad actors could use the technology to keep voters away from the polls.
A spokesman for the New Hampshire Attorney General said at the time that the messages “appeared to be an unlawful attempt to interfere with the New Hampshire Presidential Primary Election and suppress New Hampshire voters.” The agency has opened an investigation.
Users wanting to clone voices on ElevenLabs must use a credit card to pay for the feature. It is unclear whether ElevenLabs forwarded this information to New Hampshire authorities.
Bloomberg News obtained a copy of the recording on January 22 from the Attorney General’s office and tried to determine what technology was used to create it. Those efforts included running it through ElevenLabs’ own “speech classifier” tool, which is supposed to show whether the sound was created using ElevenLabs’ artificial intelligence and technology. The recording showed a 2% probability of being synthetic or created using ElevenLabs, according to the tool.
Other deepfake tools confirmed it was a deepfake but couldn’t detect the technology behind the sound. Pindrop researchers cleaned the audio by removing background noise, silence and breaking the audio into 155 segments of 250 milliseconds each for in-depth analysis, Pindrop founder Vijay Balasubramaniyan said in an interview. The company then compared the audio to a database of other samples it had collected from more than 100 text-to-speech systems commonly used to produce deepfakes, he said.
The researchers determined that it was almost certainly created with ElevenLabs technology, Balasubramaniyan said.
In ElevenLabs’ support channel on Discord, a moderator pointed out on a public forum that the company’s speech classifier can’t detect its own audio unless it’s analyzing the raw file, a point echoed by Balasubramaniyan. With the Biden call, the only files available for immediate analysis were recordings of the phone call, he said, explaining that because bits of metadata were removed and wavelengths were more difficult, it made analysis more difficult. to perceive. Siwei Lyu, a professor at the University of Buffalo who specializes in deep forensics and digital media, also analyzed a copy of the deepfake and ran it through the ElevenLabs classifier, concluding that it was likely made with that company’s software, said he told Bloomberg News. Lyu said the ElevenLabs classifier is one of the first things he checks when trying to determine the origin of deepfake audio because the software is so commonly used.
“We’re going to see a lot more of this with the general election coming up,” he said. “This is definitely a problem that everyone should be aware of.”
Pindrop shared a version of the audio scrubbed and filtered by his researchers with Bloomberg News. Using that recording, ElevenLabs’ speech classifier concluded that it was an 84% match with its own technology.
Voice cloning technology enables “a crazy combination of scale and personalization” that can trick people into thinking they’re listening to local politicians or high-ranking elected officials, Balasubramaniyan said, describing it as “worrying.”
Tech investors are pouring money into AI startups developing synthetic voices, videos and images in the hope it will revolutionize the media and gaming industries.
Staniszewski said in the interview last week that his 40-person company had five people dedicated to handling content moderation. “Ninety-nine percent of the use cases we’re seeing are in a positive area,” the CEO said. With the company’s funding announcement, the company also shared that its platform had generated more than 100 years of audio in the past twelve months.
Read more from Bloomberg Businessweek
©2024 Bloomberg LP