Skip to content

Creating Custom Voices With Elevenlabs

By: Stephen Toback

We’ve had conversations for sometime to see if we should have a “Duke voice” to use for official voice over production using tools like Elevenlabs. There are still many discussions to have if this is in fact something we should do, but for now, I’m focusing on the technology. On this blog, I’m looking into ElevenLabs’ AI voice generation technology. This allows you to create a voice using a prompt within their application.

I used Google Gemini to help me create the prompt. Using AI to create the prompts in general has been helpful. Gemini did a great job. I also used an approach to ask the AI to ask me any questions about the prompt creation to help give me more information to help create the prompt.

Here’s the initial prompt:

I’d like to create a “standard” voice over voice using ElevenLabs AI voice generation tool. Can you help me write a prompt for ElevenLabs that can describe a male voice that would be a good representation of a Duke student. For this test, I would like to prototype and African American speaker of English from the Northeast of the United States. He would be between 19 and 22 years old. This is for a voice over so it should be somewhat professional but still should sound like a college student. Are there any questions you should ask me to help create this voice in Elevenlabs?

Here’s the resulting prompt:
A young African American male in his early 20s with a clear, resonant, and intelligent tone. He has a subtle Northeastern American accent, characterized by a crisp and energetic cadence. The voice should sound like an ambitious university student: articulate, confident, and approachable. It should have a professional ‘voice-over’ quality—clean and well-paced—while maintaining the natural, relaxed inflection of a college peer. Avoid overly deep ‘radio’ tones; keep it youthful, bright, and authentic.

The “Duke” Factor: Do you want him to sound like he is giving a formal campus tour (more polished/enthusiastic) or like he is explaining a complex lab project to a classmate (more grounded/technical)?

  1. Regional Strength: Should the “Northeast” influence be very subtle (Standard American with high energy), or should there be noticeable traces of a specific city, like a slight “New York” or “DC” lilt?
  2. Vocal Texture: Do you prefer a “smooth and velvet” texture, or a voice with a bit more “breathiness and grit” to make it feel more like a real person and less like an AI?
  3. Usage Context: Is this for a short social media ad, a long-form instructional video, or a narrative story? (This affects how we describe the pacing).
  4. Would you like me to generate a second “alt” version of this prompt that focuses more on a relaxed, conversational style?
Here are my responses:
  1. explaining a complex lab project to a classmate (more grounded/technical)?
  2. influence should be very subtle
  3. bit more “breathiness and grit” to make it feel more like a real person and less like an AI but subtle
  4. This is for voice over for professional information videos from Duke University
  5. relaxed, conversational style

Here is the final prompt:

A young African American male, 20–22 years old, with an articulate and intelligent tone. The voice is grounded and conversational, sounding like a university student explaining a complex project to a peer. It features a very subtle Northeastern American cadence—energetic and fast-paced but clear. There is a slight, natural breathiness and a touch of human grit to the texture, avoiding a polished ‘radio’ sound. The delivery is relaxed, professional, and authentic, with the natural inflections of a real person speaking in a contemporary academic environment.

Gemini also gave some tips:

Tips for Implementation

  • Stability vs. Clarity: Since you want “breathiness and grit,” I recommend starting with the Stability slider a bit lower (around 35-40%) in ElevenLabs. High stability often “cleans” the voice too much, making it sound like a generic AI.
  • Style Exaggeration: Keep this low (around 10-20%). Since this is for professional Duke University videos, you don’t want the model to get too dramatic or erratic with its inflections.
  • The “Northeast” Pace: If the AI speaks too slowly, it will lose that Northeastern energy. You can usually fix this by slightly increasing the playback speed (1.05x) or adding “fast-paced delivery” to the prompt if it sounds too sluggish.

To access this feature in ElevenLabs, you want to click “Create Voice” in the top right corner and then select “Voice Design”. It costs 350 credits to create a voice.

The initial test moved too quickly between sentences so I used their metadata tag to create breaks. I noticed that the initial video VO also had a pause after “So…”. I could have easily cut the VO to match but I revised the VO to add a larger break after “So…” You get 2 regenerations with no credits charged.

Here’s the original video:

Here is the AI generated voice:

 

I did end up doing some editing of the voice to match the timing of the video and the previous voice.

It sounded strange to not use voice over only so I used Adobe Logic to break out the tracks and re-ad d just the voice over:

I think Elevenlabs AI Voice Design feature did a fantastic job. It’s very easy to change your prompt per your direction and opens up an alternative to cloned or existing voices.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *