I’m getting deep into Eleven Labs work so I thought I’d share my musings.
If you remember a while back, I wrote a script to automate sending a column of text to Eleven Labs and have it generate audio files for each shot and put them in a properly named Google Drive folder. That’s still working but I hadn’t ran it in quite some time. The first time I ran it, I noticed that it was truncating the end of almost every voice generated. I went back to my old ChatGPT session and told it about the problem and it suggested some changes.
Option 1 and 2 worked.
I then was having a really hard time with long strings of technical information like this: XX_QUERY_EXAMPLE_004
I first tried replacing the “_” with the word “underscore” – this was an easy fix (find and replace in Google Sheets) and helped the problem making sure the word “underscore” was said. The timing was strange. I tried using the <break 500ms /> command and that made it sound completely robotic. I went to Chat GPT and it suggested commas or “…” and I tried those and it helped, but still didn’t sound right. What ended up giving the best results was this: “X. X, underscore query, underscore example, underscore zero zero four” – It was throwing in extra X’s and I found that having words in all caps also made for unreliable results. Have a listen:
https://warpwire.duke.edu/w/vegIAA/
One note about Final Cut Pro vs Premiere on this type of work. I started using FCP because that’s what I’m most familiar with, but because of the number of VO replacements I have to do, it was not the right tool. When you replace a file on the hard drive, FCP throws an error if the file doesn’t exactly match the duration of the previous file. This requires you to delete the old file and drag in the new file. On the other hand, Premiere doesn’t care, you replace the file and it almost immediately updates the timeline. You may have to adjust the in/out of the new file, but that is WAY faster than working in FCP.
Well, maybe two notes. The other advantage Adobe Premiere has is its audio tracks. FCP has no concept of tracks and if you want to start with an audio edit, like we have to do for this (or like they do in animation production), you have to put your audio on the main timeline. This is a magnetic track so you can’t adjust the space between your audio tracks easily. You have to put a black card on the video track and then add your audio which connects it to the black card which makes adding content once you have it more difficult.
Finally, I used ChatGPT to process the script which was full of all caps and underscores – it is a film about databases after all. Here was my final prompt:
Text Transformation Instructions (for Eleven Labs):
-
Capitalization: Only the first word of each sentence should be capitalized. All other words should be lowercase unless grammatically required.
-
Handling “underscore”:
-
The first occurrence of the word “underscore” in a sentence should not have a comma before it.
-
For every subsequent instance of “underscore”, insert a comma immediately before the word that precedes “underscore”.
Example:
tom underscore bill underscore frank underscore billy
becomes → tom underscore bill, underscore frank, underscore billy
-
Numbers: Convert all numbers into individual digit words, with each digit separated by a space.
Example:
435 becomes → four three five
-
Trailing Spaces: Add four spaces to the end of the text in each cell.
This worked EXTREMELY well. On the last round of generation, I only had to fix common errors and things like converting ACAD_PROG to ack add prog. Much, much faster with pre-processing. I’ll add that last bit to my ChatGPT project in case it is found again. Also adding, remove “quotes” 🙂 um, and “replace “QRY” with “query”