May 10

Voicing Cloning: An AI Experiment (Part One)

A couple of years ago (it feels like forever in AI years), I shared a clone of my voice that I thought was amazing. The advancements since then are insane… amazing… terrifying.

This week I enlisted a bunch of volunteers to help me test voice cloning technology. For my demos, I used a site called ElevenLabs.io. The service has a free version, but I pay five bucks a month for 30 minutes of voice cloning and a mind-blowing amount of capabilities.

Y’all… Five Bucks. Think about that.

I am breaking this into two posts because there’s so much to tell. Stay tuned for more next week!

How easy is it to clone a voice?

Three words: Way too easy.

You really need to watch the demo (I called it “Voice Cloning in 5 Minutes,” but it’s 8.5 minutes long 🤷🏻‍♀️) for a runthrough of cloning a voice then using it to generate audio. But I’ll give you the steps here as well.

How to clone a voice with ElevenLabs

  1. Upload 1-5 minutes of clear, clean audio.
  2. Click a button.
  3. Write a few words.
  4. Click a button.
  5. Download your cloned voice saying those words.

Yep. It maybe takes 3 minutes to go from an audio clip that someone actually said to an audio clip that no one said.

Does a cloned voice really sound authentic?

The reason I did this experiment is The Perfect Scam podcast from AARP had an episode about scams on the rise. They talked about the ways bad guys are using cloned voices to scam seniors into thinking their loved ones were in danger and needed money.

The host mentioned that they haven’t seen voice cloning a lot yet for the “Grandparent” scam, but cases were rising. Businesses are seeing an uptick in scammers cloning the voice of a boss to trick employees into sending money or giving out passwords, etc.

And then there’s the horrifying story of a principal whose life was upended when a deepfake of his voice spewing hateful words circulated in his community.

So I had to try it out for myself.


Experiment One: Cloning a basic intro (not as scary)

I had AI write an intro for me, and then I used ElevenLab’s Text-to-Speech feature for a number of experiments to clone voices to read it. My apologies for the fact that you’re going to hear my intro a bunch of times.

Hilary Blair, professional speaker and voiceover artist

Hilary Blair is both an amazing speaker and a dear friend. She’s a professional voiceover artist as well, so her voice was perfect for the experiment. Click here for a sample of her real voice, then listen to both and guess the clone.

▶︎ CLICK HERE to reveal the answer

Hilary’s real voice is the orange one. The purple I created from uploading 5 minutes of one of her professional voiceover clips.

Alan Berg, podcast creator and professional speaker

Another speaking friend, Alan Berg, has a binge-worthy podcast for the wedding industry. Listen to his real voice here. Does it match the purple or the orange?

Alan Berg, CSP, Global Speaking Fellow - AlanBerg.com & Wedding Business Solutions, LLC | LinkedIn
▶︎ CLICK HERE to reveal the answer

Alan’s real voice is also the orange one. Watch the video for the full runthrough of how I cloned his voice then created the clip.


Experiment Two: Tricking my manager into giving out my credit card number (incredibly scary)

The “CEO Fraud” scam is one of the things I’m scared about the most. Typically these scams involve an employee getting an email that asks them to transfer money or give out private info. A typical one might say, “This vendor needs payment right away, and I’m out of the office and can’t do it right now.” They have a sense of urgency so the employee doesn’t have time to figure out if it’s a scam.

Now that voice cloning is so easy, the bad guys are even more convincing.

Matt Bauer, videographer and musician

Matt Bauer has expertly captured me on stage for years. He has also created some great voiceovers for the speaker demo videos he crafted for me. He’s also a professional musician… listen to his awesomeness here.

Profile photo of Matt Bauer

For this experiment using the “Speech-to-Speech” feature, I uploaded Matt’s fake message for my Director of Client Success, Haley Kruse, asking her to give out my corporate credit card number to someone who was going to call her. For this I used another feature of ElevenLab’s voice cloning tools called “Speech-to-Speech.” So I applied my voice to his audio, and both Haley and I were shocked. The deep fake would almost surely fool Haley, especially if I had added airport background noise.

For reference, here’s a similar script using Text-to-Speech. The Speech-to-Speech version is definitely more convincing.

What can you do to protect yourself from falling for a voice clone hoax?

If it’s this easy to create a clone, how can we ever know what is real?

I can’t help you (yet) with random audio and video you find on the web or if you get a random call from a stranger, but I do have an easy tip for protecting you from scams involving people you know.

Create a safe word for the people in your circles. Then use it when you’re leaving a message that asks for private info. And ask your colleagues and family to do the same. If you’re talking to someone and it seems suspicious, ask for the safe word.

More experiments in Part Two

The second half of my voice cloning adventure will answer more questions:

  • Why would someone need to use cloned voices?
  • How does ElevenLabs handle accents?
  • Can you use ElevenLabs in other languages?

Stay tuned!


Tags

audio, deepfake, privacy, security


You may also like

Is the Password Finally Dead?

Is the Password Finally Dead?
{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
>