Deepfakes Can Seem Real

March 7, 2024

Improved voice cloning technology produces believable audio impersonations

Early in 2023, a criminal attempted to extort $1 million from an Arizona-based woman whose daughter he claimed to have kidnapped. Over the phone, the distraught mother heard what sounded like her daughter yelling, crying, and frantically pleading for help. It wasn’t her daughter. It was a deepfake audio enabled by artificial intelligence (AI).

Easy to Make, Hard to Spot

Deepfakes have been around for years, but voice cloning software previously produced robotic, unrealistic voices. With today’s stronger computing power and more refined software, deepfake audio is more convincing. As is the case with many technological advances, criminals are early adopters and taking advantage of the nefarious opportunities provided.
By using voice cloning technology, such as ElevenLabs’ AI speech software VoiceLab, all it may take to create a convincing impersonation is a short audio clip of the targeted person’s voice, pulled from a video posted to social media platforms like Facebook and Instagram. The technology uses AI tools that analyze millions of voices from various sources and spot patterns in elemental units of speech (called phonemes). A person simply types in what they want the targeted voice to say, and a deepfake audio can be created.

In addition to improvements in the power of voice-cloning technology, two other factors are leading to more deepfakes. First, the technology is increasingly afford-able — some software offers basic features for free and charges less than $50 a month for the paid version with advanced fea-tures. And second, the tools are easy to use, thanks to the growing number of training videos posted online. Unfortunately, this means almost anyone can create a deepfake audio meant to deceive listeners, opening the floodgate to fraudulent activities.

Many Fraudulent Uses

The kidnapping example mentioned earlier is just one of many ways deepfake audio is being used. Criminals are also impersonating people including:

Legitimate bank customers to hijack their accounts and create new ones to run up debt or launder money
Recently deceased individuals to collect welfare checks and other funds
Company executives to fool the listener into providing a password or other sensitive information
Young family members supposedly in trouble — such as in jail and needing bail — to get grandparents to send money
Famous music artists to sell fake audio clips online
Famous actors to use their “voice” to hawk products in commercials
Political candidates to sway public opinion and influence elections

Clearly, this flood of fake content can have real-world consequences for consumers, communities, and countries. Deepfake audio could enable criminals to steal identities and money, foster discord and distrust, generate confusion and violence, and more. In a disinformation landscape, people can’t tell what’s real and what’s fake, which is cause for concern.

Steps to Mitigate Risk

What’s being done to address these threats? Some voice-cloning vendors appear to be taking measures to mitigate the risk. ElevenLabs announced it had seen an increasing number of voice-cloning misuse cases among users and is considering adding additional account checks, such as full ID verification, verifying copyright to the voice, or manually verifying each request for cloning a voice sample. Facebook parent Meta, which has developed a generative AI tool for speech called VoiceBox, has decided to go slow in how it makes the tool generally available, citing concerns over potential misuse.

On October 12, 2023, four U.S. senators announced a discussion draft bill aimed at protecting actors, singers, and others from having their voice and likeness generated by artificial intelligence. The bipartisan NO FAKES Act (Nurture Originals, Foster Art, and Keep Entertainment Safe Act) would hold people, companies, and platforms liable for producing or hosting such digital replicas.