Comparing Machine Transcription Options from Rev and Sonix

As part of our continuing exploration of new options for transcription and captioning, two members of our media production team tested the automated services offered by both Rev and Sonix. We submitted the same audio and video files to each service and compared the results. Overall, both services were surprisingly accurate and easy to use. Sonix, in particular, offers some unique exporting options that could be especially useful to media producers. Below is an outline of our experience and some thoughts on potential uses.

Accuracy

The quality and accuracy of the transcription seemed comparable. Both produced transcripts with about the same number of errors. Though errors occurred at similar rates, they interestingly almost always occurred in different places. All of the transcripts would need cleaning up for official use but would work just fine for editing or review purposes. The slight edge might go to Rev here. It did a noticeably better job at distinguishing and identifying unique speakers, punctuating, and in general (but not always) recognizing names and acronyms.  

Interface

When it came time to share and edit the transcripts, both services offered similar web-based collaborative tools. The tools feature basic word processing functions and allow multiple users to highlight, strikethrough, and attach notes to sections of text. After it’s recent updates, the Rev interface is slightly cleaner and more streamlined. Again, the services are pretty much even in this category.

Export Options

This is where things get interesting. Both services allow users to export transcripts as documents (Microsoft Word, Text File, and, for Sonix, PDF) and captions (SubRip and WebVTT). However, Sonix offers some unique export options. When exporting captions, Rev automatically formats the length and line breaks of the subtitles and produces reliable results. Sonix, on the other hand, provides several options for formatting captions including character length, time duration, number of lines, and whether or not to include speaker names. The downside was that using the default settings for caption exporting in Sonix led to cluttered, clunky results, but the additional options would be useful for those looking for more control of how their captions are displayed.

Sonix also allows two completely different export options. First, users can export audio or video files that include only highlighted sections of the transcript or exclude strikethroughs. Basically, you can produce a very basic audio or video edit by editing the transcript text. It unfortunately does not allow users to move or rearrange sections of media and the edits are all hard cuts so it’s a rather blunt instrument, but it could be useful for rough cuts or those with minimal editing skills.

Sonix also provides the option of exporting XML files that are compatible with Adobe Audition, Adobe Premiere, and Final Cut Pro. When imported into the editing software these work like edit decision lists that automatically cut and label media in a timeline. We tried this with two different audio files intended for a podcast, and it worked great. This has the potential to be useful for more complicated and collaborative post-production workflows, an online equivalent of an old school “paper edit”. Again, the big drawback here is the inability to rearrange the text. It could save time when cutting down raw footage, but a true paper edit would still require editing the transcript with timecode in a word processing program.

And the winner is…

Everyone. Both Rev and Sonix offer viable and cost-effective alternatives to traditional human transcription. Though the obvious compromise in accuracy exists, it is much less severe than you might expect. Official transcripts or captions could be produced with some light editing, and, from a media production perspective, quick and cheap transcripts can be an extremely useful tool in the post-production process. Those looking to try a new service or stick with the one they’re familiar with can be confident that they’re getting the highest quality machine transcription available with either company. As more features get added and improved, like those offered by Sonix, this could become a helpful tool throughout the production process.

New Machine Transcription Option from Rev

We recently posted about some exciting new options in the world of captioning spearheaded by a company called Sonix, which offers a page for account set-up for members of the Duke community that waives monthly subscription charges as part of their edu program. Hot on the heels of that announcement, we learned that Rev.com, who has long offered high quality human-generated transcriptions for Duke, now has their own machine transcription option. It’s a bit more expensive than Sonix at ten cents per minute as opposed to around 8 cents per minute for Sonix. We’re working on a detailed comparison of the two services and will share more info here as we have it.

Rev's New Machine Transcription Option

Rev also just announced improvements to their caption editor. We’d love to have your feedback about these changes as well as about your use of Rev’s new machine transcription option. According to Rev, the improvements to the editor include:

  • Text selection toolbar – keep your timestamp, highlight, strikethrough, and comment tools where you need them, contextually accessible next to the text you just selected.
  • White theme – a light, minimal color scheme to bring the Transcript Editor into the same modern styling as the rest of Rev.com.
  • Streamlined transcript body – no more cluttered columns, all speaker names and timestamps are now in-line with the transcript body, so you can focus on the content that matters to you.

For a full, updated walkthrough of all Transcript Editor functionality, see The Rev Transcript Editor, a Guide for First Time Users.

 

 

Help Us Test Sonix.ai

OIT has been following what’s happening in the evolving world of captioning over the years, and in particular monitoring the field for high quality, affordable services we think would be useful to members of the Duke community. When Rev.com came along, offering guaranteed 99% accurate human-generated captions for a flat $1.00 a minute (whereas some comparable services were well over $3.00/minute), we took note and have facilitated a collaboration with them that has been very productive for Duke. A recent review of our usage shows that a lot of you are using Rev, with a huge uptick in usage over the last couple years, and we’ve heard few if any complaints about the service.

While in general there has been a dismissive attitude toward machine (automatic) transcription, the newest generation technology, based on IBM Watson, has become so good that we can no longer (literally) afford to ignore it. With good quality audio to work from, this speech-to-text engine claims to deliver accuracy as high as 95% or more. IBM Watson isn’t a consumer-facing service, but we’ve been on the lookout for vendors building on this platform, and have found one we feel is worth exploring called Sonix. If cost is a significant factor for you, you might consider giving it a try.

Sonix captioning is a little over 8 cents per minute, and has waived the monthly subscription requirement and offered 30 free minutes of captioning for anyone with a duke.edu email address who sets up their account through this page: https://sonix.ai/academic-program/duke-university.

We are not recommending Sonix at this time, but are interested to hear what your experiences with them are. And we would caution that with any machine transcription technology, a review of your captions via the company’s online editor is required if you want to use this as closed captions (vs just a transcription). In our initial testing Sonix’s online editor looks fairly quick and easy to use.

If you set up an account and try Sonix, please reach out to oit-mt-info@duke.edu to let us know what your experiences are and what specific use cases it supports.

 

New Machine Caption Options Look Interesting

We wrote in April of last year about the impact of new AI and machine learning advances in the video world, and specifically around captioning. A little less than a year later, we’re starting to see the first packaged services being offered that leverage these technologies and make them available to end users. We’ve recently evaluated a couple options that merit a look:

Syncwords

Syncwords offers machine transcriptions/ captions for $0.60/per minute, and $1.35/ minute for human corrected transcriptions. We tested this service recently and the quality was impressive. Only a handful of words needed adjustment on the 5 minute test file we used, and none of them seemed likely to significantly interfere with comprehension. The recording quality of our test file was fairly high (low noise, words clearly audible, enunciated clearly).

Turnaround time for machine transcriptions is about 1/3 of the media run time on average. For human corrected transcriptions, the advertised turnaround time is 3-4 business days, but the company says the average is less than 2 days. Rush human transcription option is $1.95 with a guaranteed turnaround of 2 business days and, according to the company, average delivery within a day.

Syncwords also notes edu and quantity discounts are available for all of these services, so please inquire with them if interested.

Sonix.ai

Sonix is a subscription-based service with three tiers: single-User ($11.25 per month and $6.00 per recorded hour/ $0.10/minute), Multi-User ($16.50 per user/month and $5.00 per recorded hour) , and Enterprise ($49.50 per user/month, pricing available upon request).  You can find information about the differences among the tiers here: https://sonix.ai/pricing

The videos in the folder below show the results of our testing of these two services together with the built in speech-to-text engine currently utilized by Panopto. To be fair, the service currently integrated with Panopto is free with our Panopto license, and for Panopto to license the more current technology would likely increase their and our costs. We do wonder, however, whether it is simply a matter of time before the currently state-of-the art services such as featured here become more of a commodity:

https://oit.capture.duke.edu/Panopto/Pages/Sessions/List.aspx?folderID=4bd18f0c-e33a-4ab7-b2c9-100d4b33a254

 

Rev Adds New Rush Option

Rev.com‘s captioning services have been in wide use at Duke for the last couple years in part because of their affordability (basic captioning is a flat $1.00/minute), the generally high accuracy of the captions, and the overall quality of the user experience Rev offers via its well-designed user interfaces and quality support. Quick turnaround time is another factor Duke users seem to appreciate. While the exact turnaround times Rev promises are based on file length, we’ve found that most caption files are delivered same or next day.

Rev.com

For those of you who need guaranteed rush delivery above and beyond what Rev already offers, the company just announced it now offers an option that promises files in 5 hours or less from order receipt. There is an additional charge of $1.00/minute for this service. To choose this option, simply select the “Rush My Order” option in desktop checkout.

If any of you utilize the new rush service, we’d love to hear how it goes. Additionally, if you have any other feedback about your use of Rev or other caption providers, please feel free to reach out to oit-mt-info@duke.edu.

New Search Feature on Rev.com

While OIT continues to actively explore captioning technologies and vendors as they become available, Rev.com has been the most used captioning vendor at Duke in recent years because of their relatively low cost, and intuitive workflows. Rev just announced a new feature that is likely to be of proportional interest to the amount of captioning you’ve done with Rev, or plan to do. You can now search your body of caption files on the Rev website for individual words and phrases and call up the specific caption files in which those words are included. For example, in the example below, searching for the word “blockchain” on an account pulls up a couple options. You can click on the title of the individual caption files to go to that file within Rev’s online editor.

Rev's Search Feature

Wirecast 10 Adds Live Captions

Wirecast recently announced a new cloud-based service that supports live captions based on ASR (automatic speech recognition) and an rtmp re-streaming service. Both work in conjunction with Wirecast 10. This means that if you are using Wirecast 10, you can automatically caption your videos and simultaneously push them to another provider like YouTube or Facebook live. This is an interesting development because we are seeing the entrance of new ASR platforms like IBM Watson that claim to offer much greater accuracy than has been possible with earlier generation ASR technologies. I’m not sure what platform Wirecast is leveraging, but we’d love to hear from anyone at Duke using Wirecast 10 who is willing to give their 100 minute free trial a go.

New Wirecast Cloud Services

It’s a subscription-based service with monthly fees starting at $25.00/month for re-streaming and $60.00/month for live captions. Detailed information and a link to set up an account and get started can be found here:

https://www.telestream.net/wirecast/webservices/

Captions! Captions! Get your FCPX Captions Here!

As a self-proclaimed accessibility nut, offering subtitles/closed captions isn’t simply a nicety in 2018… it’s a necessity. This is particularly true now that my ears have passed their prime, perhaps due to one too many Guided by Voices concerts in my youth. Now, before we get a flood of “Adobe Premiere did it first!” I acknowledge that a similar feature has been available on that platform for some time, but whenever I dip my toe in Premiere on a quasi-annual basis… I quickly retreat to the warm embrace of Final Cut Pro.

 

To put this in context, I don’t shoot or edit many videos these days. But, when I do, my process for captioning is to edit the video in Final Cut Pro, export the video, upload the video to YouTube (unlisted), and let YouTube work its machine language captioning magic. Usually, within a few minutes or so, YouTube has a subtitle that’s about 80%+ accurate. From within YouTube, I then go in and manually edit the captions to achieve a near 100% accurate caption for the video. Finally, I make the video publicly viewable.

The above method is great… unless you need to re-upload the video to YouTube (or a different service) with a number of edits. Also, the longer and more complex the video becomes, the more complex managing the subtitles can become.

In a perfect world, you’d caption your footage as it is imported, either manually or sending it out to a service. This has a number of advantages, especially for larger projects. First, metadata! Searching through hours of footage for a key phrase YOU KNOW your subject said is absolutely frustrating. Wouldn’t it be better if you could search your media library for that phrase? When you caption first, this becomes possible. Second, when you make edits, the captions follow the footage. So, when you make dozens of edits… you don’t need to touch the subtitles. Very cool…

Final Cut Pro 10.4.1 is only a few days old, but it seems to be well designed and feels very Apple. Also, it wouldn’t be an Apple feature if it didn’t use a unique format called Apple iTunes Timed Text (iTT or .itt). Don’t worry, this is actually an upgrade from traditional .str caption files. With .str, you basically have the time and the world to be displayed on the screen. But, with Apple’s .itt format, you can also embed color information and location of the text on the screen. Also, .itt files import into YouTube with little trouble. If .itt just isn’t going to work, you can also select CEA-608 which is ideal for DVD or Blu-ray mastering, but .itt is the more capable format.

I’ll be keeping an eye on this feature to see if Apple eventually adds their own Siri voice to text within Final Cut Pro (perhaps FCPX 10.5?), but for now, this is a great feature for those of us that love captioning.

 

 

The Impact of Artificial Intelligence on Video

Big advances are taking place in intersection of video with AI (Artificial Intelligence). I ran across an interesting article in Streaming Media Magazine called The State of Video and AI 2018 that takes stock of some of these changes and I wanted to share it with you as we look toward what’s ahead for Duke.

 

We’ve been following trends in this area from a number of directions, including video captioning. As many of you are aware, the needs for captioning videos we produce at Duke are increasing, but the costs of captioning services, most of which rely on intensive manual labor, are high. However, new tools like IBM’s Watson, which includes more than 60 AI services, including machine captioning (with accuracy advertised as a whopping 96%), seem poised to shift the balance and make it possible for us to caption videos on a wider scale. We demoed Watson recently and will continue to monitor it as well as other tools in this space.

In this context I also wanted to point out that we recently began offering ASR (Automatic Speech Recognition) for Panopto, Duke’s lecture capture service. We are excited about the opportunities this new functionality will offer students and other viewers who are looking to drill down to points in videos where specific terms are found. This feature adds to Panopto’s already healthy set of features built around in-video search, including OCR (Optical Character Recognition) for slide content, and user-created time-stamped notes and bookmarks.

New Features at Rev.com

Rev.com, currently the most widely utilized caption service provider at Duke, just announced some new features we wanted to let you know about. All are included at no extra charge in their standard $1.00 per minute service. For more information about getting started with Rev or another caption provider, you can visit https://oit.duke.edu/what-we-do/services/captioning. You may also be interested in attending A Hands-on Guide to Captioning at Duke, a Learn IT@Lunch session scheduled for Wednesday, January 31, 2018 in which OIT’s Joel Crawford Smith and Todd Stabley will discuss video captioning at Duke and help you set up an account with Rev and get started captioning your videos.

Browser-based Captions Editor: It makes minor fixes, converting formats and frame rates. You can access it on any Order Detail page by clicking “edit”. Or give it a test run here: https://www.rev.com/captions-editor/sandbox

Rev's new browser-based caption editor

Rev’s new browser-based caption editor

Browser-based Transcript Editor: Allows changes like formatting, speaker labels, etc. If you order timestamps, Rev gives you a transcript with word by word timestamps that play along with your file. You can test it out here: https://www.rev.com/transcript-editor/sandbox

Turnaround: Rev reduced transcription turnaround by 25% and caption turnaround by 50% over the last 12 months.

Revver network: Rev crossed 14,000 monthly active Revvers (freelancers who transcribe and caption). 90% are based in US/Canada. This allows us to turn around large volumes with high quality. Rumor has it a Duke staffer who thought they were quite qualified to be a Rev captions editor and was rejected. Say it isn’t so!

Support coverage: Rev expanded their 24/7 support to the weekends as well.

Additional improvements: custom timestamp offset for transcripts, PDF and TXT transcript outputs, and improved Rev API support.

If any of our peer Universities are interested in speaking with Rev, feel free to reach out to us and we’ll connect you.