Extraordinary technology. Workflow and pricing model still need work.
I’m working with a researcher who has 1,000 videos in Urdu that he wants dubbed into English. I expected that ElevenLabs would be the right tool. In testing, it did a fantastic job both in the accuracy of the translation and using the instructor’s voice for the English.
Unfortunately, given the current limitations of the tool, the instructor may in fact just re-record all of the videos rather than work around all of the limitations and pricing structure of the tool. Here’s a list of challenges:
- The tool recommends for better accuracy that clips no longer than 15 minutes and has a hard restriction of 45 minutes or 500MB per clip. All of these clips are over 30 minutes and some are 60 to 90 minutes in length. Having to break these clips up, feed them in separately and then either recombine them or manually create playlists is not a preferred approach. We could end up with 2-4,000 clips to manage.
- The tool does not “learn”. It would be better if the tool could learn from the videos as they are uploaded so the translation and cloning would improve as well. It currently treats each video as it’s own video with no reference to previous work. It seems that the tool does have the ability to “learn” from it’s cloning training so given the extra cost for using their dubbing studio, I think that should be included.
- Captions – this was surprising that you cannot generate captions from the text. All of the information is there already, it should be fairly trivial to export a VTT or other format caption file based on the English translation. Again, given the cost, that should be in there.
- Pricing Structure – While I can understand and appreciate their monthly pricing structure/limits, for a single large project, it doesn’t make sense. It’s not an ongoing thing so they should be able to allow you to just purchase a set number of hours without time limits – expiration. It’s fine to limit the number of minutes you can process PER month, but minutes should roll over.
So what’s good?
The fact that it can even do this as well as it can is truly remarkable. It’s initial voice cloning was very good and you can change the voice to a better trained voice of the instructor.
The dubbing studio interface is really well designed and easy to use. Really nice.
As with most things AI, things are still developing and we often have to adjust our expectations with the understanding how relatively new this type of service is.