We had a problem uploading our first test of Elevenlabs new engine, version 3. There was clicking in the sample. I tried, but failed to contact Elevenlabs support (which has been an ongoing issue that I need to resolve. I thought it might be a sample error or corrupted upload, but in order to upload the sample again, I needed to authorize the model. This is a failsafe that Elevenlabs does to make sure I can’t just download a voice from the internet and then clone it. They give you a random sentence and the person whose voice you are sampling must read it into the computer’s microphone. I’ve tried this in the past (was a long time ago) using Zoom and it failed. I tried this time using my phone, and it failed. I have a nifty new M4 Macbook Pro with great sounding speakers so we tried it again using Zoom and it worked!
The new voice model was made using 29 minutes of audio. It says 30 is minimum and says 2 hours is best. We learned that it is important to have a very quiet recording with as little extraneous noise as possible. If there a lot of breaths for example, that will be in the model. If you edit a lot of samples together and don’t pay particular attention to the spacing of the different samples, that unnatural rhythm will also find its way into the model.
The sample was uploaded and I tried rendering some text in v2 and v3. It’s curious that it told me I’d have better results using v2. My initial impression is that v3 may be too expressive for training videos. You don’t have as many variables to adjust in v3. Initial tests with the slider in the middle between “Creative” and “Robust” yielded a voice that was too animated in my option. The option below was achieved by sliding the slider all the way to “Robust”
To that, I think v2 is too monotone, but I am going to do further experimenting to see if I can find a happy medium between the two.
I can’t really say if the encoding was different or better, but I think we had a better model to start with so I’m hopeful this will yield even better results than we received initially.