Deepfakes have been around for years, but voice cloning software previously produced robotic, unrealistic voices. With today’s stronger computing power and more refined software, deepfake audio is more convincing. As is the case with many technological advances, criminals are early adopters and taking advantage of the nefarious opportunities provided.
By using voice cloning technology, such as ElevenLabs’ AI speech software VoiceLab, all it may take to create a convincing impersonation is a short audio clip of the targeted person’s voice, pulled from a video posted to social media platforms like Facebook and Instagram. The technology uses AI tools that analyze millions of voices from various sources and spot patterns in elemental units of speech (called phonemes). A person simply types in what they want the targeted voice to say, and a deepfake audio can be created.
In addition to improvements in the power of voice-cloning technology, two other factors are leading to more deepfakes. First, the technology is increasingly afford-able — some software offers basic features for free and charges less than $50 a month for the paid version with advanced fea-tures. And second, the tools are easy to use, thanks to the growing number of training videos posted online. Unfortunately, this means almost anyone can create a deepfake audio meant to deceive listeners, opening the floodgate to fraudulent activities.
The kidnapping example mentioned earlier is just one of many ways deepfake audio is being used. Criminals are also impersonating people including:
Clearly, this flood of fake content can have real-world consequences for consumers, communities, and countries. Deepfake audio could enable criminals to steal identities and money, foster discord and distrust, generate confusion and violence, and more. In a disinformation landscape, people can’t tell what’s real and what’s fake, which is cause for concern.
What’s being done to address these threats? Some voice-cloning vendors appear to be taking measures to mitigate the risk. ElevenLabs announced it had seen an increasing number of voice-cloning misuse cases among users and is considering adding additional account checks, such as full ID verification, verifying copyright to the voice, or manually verifying each request for cloning a voice sample. Facebook parent Meta, which has developed a generative AI tool for speech called VoiceBox, has decided to go slow in how it makes the tool generally available, citing concerns over potential misuse.
On October 12, 2023, four U.S. senators announced a discussion draft bill aimed at protecting actors, singers, and others from having their voice and likeness generated by artificial intelligence. The bipartisan NO FAKES Act (Nurture Originals, Foster Art, and Keep Entertainment Safe Act) would hold people, companies, and platforms liable for producing or hosting such digital replicas.