My AI Voice Clone Experiment: 3 Months, 50 Videos, Zero Recording
I hate my recorded voice. I mean, really hate it. Every time I listen to a playback, I cringe. It's one of those things that stopped me from starting YouTube for almost a year.
So when I heard about AI voice cloning, I was skeptical. But also desperate enough to try. Three months later, I've published 50 videos using a voice that sounds like mine — without ever turning on a microphone. Here's what that journey looked like.
Setting It Up
ElevenLabs' voice cloning is straightforward. You record a minimum of 30 seconds of clean audio (they recommend 10+ minutes for best results), upload it, and the AI analyzes your speech patterns, tone, and cadence.
I recorded 3 minutes of me reading a script in my home office. No fancy mic — just my AirPods. Uploaded it, waited about 30 seconds, and there it was: a digital version of my voice. I typed a sentence, clicked generate, and heard myself say something I never actually said. Weirdest feeling ever.
The First Month: Quality Issues
Honestly, the first batch of videos sounded... off. The AI would occasionally emphasize the wrong word, or the pacing would feel robotic. A viewer even commented, "Did you use AI voice? Something feels different." Called out in week two.
I almost gave up. But I did two things that fixed it:
- Recorded more samples. I added 15 more minutes of varied content — some energetic, some calm, some with questions (rising intonation). This gave the AI more data to work with.
- Added "stability" and "similarity" sliders. ElevenLabs has advanced settings. Lower stability (around 35%) adds more natural variation. Higher similarity (over 80%) keeps it sounding like me. The default settings are not optimal for creator content.
After those tweaks, the "off" comments stopped. In fact, nobody has mentioned the voice since. Which is exactly what I wanted — for people to hear the content, not the delivery.
The Numbers
- Videos published: 50 in 90 days (one every ~2 days)
- Total recording time: 0 minutes (for voiceover)
- Total editing time saved vs recording: roughly 25 hours (I used to re-record each script 3-4 times)
- Cost: $5/month for ElevenLabs Pro plan
- Channel growth: 340 → 1,280 subscribers in 3 months
What I'd Tell Someone Considering Voice Cloning
- It's not ready for emotional narration — podcasts, storytelling, or anything requiring genuine emotional range still needs a human voice.
- For tutorials, listicles, and educational content? It's already there. The tone works perfectly for "explainer" content.
- Disclose it or don't — that's your call. I don't explicitly disclose, but I won't lie if asked. Most viewers genuinely don't notice.
- The first 10 videos will feel weird to you. To your audience, they'll sound fine. We're our own worst critics.
Would I do it again? Absolutely. My only regret is not trying it sooner. The time I save on recording goes into writing better scripts — and that's what actually grows a channel.