AI DDMC

AI Tools

In the world of audio and video production, having the right tools can make a huge difference in the quality and efficiency of your work. In this article, we will be discussing three powerful applications that can help you take your audio and video production to the next level: Fliki, Dall-E2, and Descript AI. Each of these tools offers unique features and capabilities that can be used to create high-quality content quickly and easily.

Stephen Toback, does caution the use of the tools in a responsible and safe way before unleashing the magic of what these tools will do, and not putting Duke at risk. He highlights that the tools require users to give some information to it, and it’s important not to put any data into any of these applications that Duke hasn’t approved as public data. He also advises users to create accounts on the sites with alternate IDs, not Duke email address, and not to use the same net ID password. Additionally, he mentions that these tools are changing rapidly and should not be built into critical business applications yet. He encourages users to join an ongoing conversation about the tools in a Microsoft teams chat (Staff AI).

David Stein presented several AI tools such as Fliki a text-to-audio application that allows users to create high-quality audio recordings with a wide range of voices and emotions. Dall-E2 is a powerful application that has a wide range of potential uses, including the ability to generate detailed prompts for video and audio production. Descript AI is a web-based, collaborative system for video and audio editing that includes built-in capture from Zoom, green screen function, and stock images, music, video and sound effects. Together, these tools make it possible to generate high-quality content quickly and efficiently, with the added bonus of using ChatGPT for script and dialogue generation.

For More Details Please Review our Recorded Discussion Here

Another Script-Based A/V Editing Option: The Camtasia-Audiate Integration

Descript, a new video and audio creation and editing tool, has been making waves recently on campus with its ability to generate a script file for your project, and allowing you to edit the video simply by making changes to the script. One of the more useful things you can do with this approach is to automate the elimination of awkward “um’s” and pauses, adjust the speed and pacing of your project, and tighten things up quickly in other ways to make your media more listenable.

Some of you might not know, however, that there is a similar approach for those who already use Camtasia. Like Descript, Techsmith’s Audiate utilizes cloud-based speech-to-text technology to generate a script for your audio project. And like Descript, the changes you can make in Audiate, which include many of the things you can do in Descript, get exported back to your audio file without your having to touch that file in a timeline based editor. While Audiate in itself is geared for podcasters, those who are working with video or screen animation can get the full package via Techsmith’s roundtrip integration between Camtasia and Audiate. One cool feature of this integration we wanted to point out is that if you are working with a screen animation where you are using a cursor, Camtasia/ Audiate animates the movement of the cursor between cut points so your viewers don’t see it randomly jumping around on the screen. See below for a demo of Camtasia/ Audiate in action.

Pricing for Audiate at first glance seems to be about the same for Descript, so if you are working with video and are not already a Camtasia user, it probably makes the most sense to use Descript. However, there are discounts available for the Camtasia/ Audiate package.

Since interest seems to be growing in these types of tools and workflows, we would love to hear from you if you’ve tried either of them, and would especially be interested to hear how you think Camtasia/ Audiate stacks up to Descript for your use cases.

Wolfvision CYNAP Pro

App-free dongle-free screen sharing!

Connect and share your screen using the wireless technology that’s built into your own mobile device. Our wireless BYOD solutions suit all iOS, iPadOS, Android, Chrome OS, Windows and Mac devices – with full support for AirPlay, Chromecast, and Miracast screen mirroring.

Record your content

Cynap Pro lets you record all your multi-window, multimedia content. Everything is captured in high definition and saved internally – perfect for use as part of your online educational program. The included Capture feature pack enables operation as capture agent for Panopto, and other compatible video management platforms.

Multi-platform web conferencing

Cynap Pro‘s multi-platform web conferencing solution runs directly on the device itself, and is designed to solve many of the issues most commonly experienced with BYOM web conferencing systems.

Complex multi-step setup, and bandwidth issues are eliminated, and Zoom, MS Teams or WebRTC-based wireless conferencing sessions are easily started and controlled, using a simple workflow, from a touchscreen, or any laptop, smartphone, or tablet.

Stream & record to mobile

Our unique vSolution App for iOS, iPadOS, Android, and Windows lets your audiences receive and record a live stream of presentation or lecture content from Cynap Pro onto their own smartphones and tablets.

Freedom to present

Cynap Pro plays, displays, records, and streams all commonly used media at the same time, giving you unlimited choice of materials during presentations, lectures, and active learning classes.

Access your data easily via cloud, network drive or from mobile devices – even your laptop is no longer essential – you can simply bring your content on a USB stick, or download it directly from the cloud!

Annotate over any open window

Add to your content material using our built-in annotation features – or note down your ideas using the digital whiteboard, and save the output of both for future use.

If you are looking for an all-in-one with only needing to add audio processing? The CYNAP line of products is a good way to go.

Comparing Machine Transcription Options from Rev and Sonix

As part of our continuing exploration of new options for transcription and captioning, two members of our media production team tested the automated services offered by both Rev and Sonix. We submitted the same audio and video files to each service and compared the results. Overall, both services were surprisingly accurate and easy to use. Sonix, in particular, offers some unique exporting options that could be especially useful to media producers. Below is an outline of our experience and some thoughts on potential uses.

Accuracy

The quality and accuracy of the transcription seemed comparable. Both produced transcripts with about the same number of errors. Though errors occurred at similar rates, they interestingly almost always occurred in different places. All of the transcripts would need cleaning up for official use but would work just fine for editing or review purposes. The slight edge might go to Rev here. It did a noticeably better job at distinguishing and identifying unique speakers, punctuating, and in general (but not always) recognizing names and acronyms.  

Interface

When it came time to share and edit the transcripts, both services offered similar web-based collaborative tools. The tools feature basic word processing functions and allow multiple users to highlight, strikethrough, and attach notes to sections of text. After it’s recent updates, the Rev interface is slightly cleaner and more streamlined. Again, the services are pretty much even in this category.

Export Options

This is where things get interesting. Both services allow users to export transcripts as documents (Microsoft Word, Text File, and, for Sonix, PDF) and captions (SubRip and WebVTT). However, Sonix offers some unique export options. When exporting captions, Rev automatically formats the length and line breaks of the subtitles and produces reliable results. Sonix, on the other hand, provides several options for formatting captions including character length, time duration, number of lines, and whether or not to include speaker names. The downside was that using the default settings for caption exporting in Sonix led to cluttered, clunky results, but the additional options would be useful for those looking for more control of how their captions are displayed.

Sonix also allows two completely different export options. First, users can export audio or video files that include only highlighted sections of the transcript or exclude strikethroughs. Basically, you can produce a very basic audio or video edit by editing the transcript text. It unfortunately does not allow users to move or rearrange sections of media and the edits are all hard cuts so it’s a rather blunt instrument, but it could be useful for rough cuts or those with minimal editing skills.

Sonix also provides the option of exporting XML files that are compatible with Adobe Audition, Adobe Premiere, and Final Cut Pro. When imported into the editing software these work like edit decision lists that automatically cut and label media in a timeline. We tried this with two different audio files intended for a podcast, and it worked great. This has the potential to be useful for more complicated and collaborative post-production workflows, an online equivalent of an old school “paper edit”. Again, the big drawback here is the inability to rearrange the text. It could save time when cutting down raw footage, but a true paper edit would still require editing the transcript with timecode in a word processing program.

And the winner is…

Everyone. Both Rev and Sonix offer viable and cost-effective alternatives to traditional human transcription. Though the obvious compromise in accuracy exists, it is much less severe than you might expect. Official transcripts or captions could be produced with some light editing, and, from a media production perspective, quick and cheap transcripts can be an extremely useful tool in the post-production process. Those looking to try a new service or stick with the one they’re familiar with can be confident that they’re getting the highest quality machine transcription available with either company. As more features get added and improved, like those offered by Sonix, this could become a helpful tool throughout the production process.

The Rise and Fall of BYOD

The bring your own device (BYOD) meeting or teaching space has been a popular model for small and medium meeting and teaching spaces. With the rise of inexpensive and ultra-portable laptops and tablets, the traditional “local computer” has slowly lost favor in many spaces. The computer is expensive, requires significant maintenance, and is a prime target for malicious software. Also, users generally prefer using their own device as they know the ins and outs of the hardware and operating system they prefer. The BYOD model worked well when the guest was sharing a presentation or video to a local projector or monitor. But, as AV systems have grown to include unified communication (UC) systems (WebEx, Zoom, Skype, etc.), the pain points of BYOD have been magnified.

First, when hosting a meeting on a BYOD device, connecting your device to a projector or monitor is usually rather straightforward since standardizing on HDMI. Yes, you may still need a dongle, but that’s an easy hurdle in 2019. But, as we add UC, Zoom as an example, to the meeting, things get complicated. First, you need to connect the laptop to a local USB connection (this may require yet another dongle). This USB connection may carry the video feed from the in-room camera and the in-room audio feed. This may not sound complicated, but those feeds may not be obvious. For example, the camera feed could be labeled Vaddio, Magewell, or Crestron. With audio, it can be equally difficult to discover the audio input with labels such as USB Audio, Matrox, or Biamp. Sure, many reading this article may be familiar with what these do… but even as a digital media engineer, these labels can mean multiple things.

But, who cares… we are saving money while giving maximum AV flexibility, right? Errr, not really. Yes, those with the technical understanding of how the AV system works will be able to utilize all of the audiovisual capabilities… but for the rest of the world, there might as well not be an AV system in the space. Even worse, for those that have ever attended a meeting where it takes 10+ minutes to connect the local laptop to the correct mics, speakers, and camera, you may be losing money in the form of time, compounded by every person in attendance.

The Solution?
Soft codecs to the rescue! With the rise of UC soft codecs (Zoom Room, Microsoft Teams Rooms and BluJeans Rooms, etc.) you can integrate an inexpensive device (a less expensive computer) that is capable of performing a wide range of tasks. First, all of the in-room AV connects to the soft codec, so no fumbling for dongles or figuring out which audio, mic, speaker input/output is correct. Second, the soft codec monitors the space to ensure the hardware is functioning normally, breaking local AV groups out of break fix into a managed model. Third, with calendar integration, you can schedule meetings with a physical location. The icing on the cake is that most of these UC soft codecs offer wireless sharing… so you can toss your AppleTV, Solstice Pod, etc. out the window (OK, don’t do that… but it’s one less thing you need to buy during your next refresh). Oh, and don’t even get me started about accessibility and lecture capture!

We have a keen eye on soft codec system as a potential replacement to traditional classroom AV systems in the mid to long term… and so should you.

Remote AV Control

“If only I could be in two places at once!”
– Every AV Technician… Ever.

But… what if you COULD be in two places at once? During a training earlier this year, I discovered that one hardware manufacturer offered a simple method of gaining remote access to the GUI of an AV system. As you built the system, it automatically created a password protected HTML5 web page where (assuming you knew the correct URL/password) you could control the system.

As organizations demand more from their AV systems, this kind of functionality will be an invaluable resource for small AV groups when providing evening or emergency AV support.

New Machine Caption Options Look Interesting

We wrote in April of last year about the impact of new AI and machine learning advances in the video world, and specifically around captioning. A little less than a year later, we’re starting to see the first packaged services being offered that leverage these technologies and make them available to end users. We’ve recently evaluated a couple options that merit a look:

Syncwords

Syncwords offers machine transcriptions/ captions for $0.60/per minute, and $1.35/ minute for human corrected transcriptions. We tested this service recently and the quality was impressive. Only a handful of words needed adjustment on the 5 minute test file we used, and none of them seemed likely to significantly interfere with comprehension. The recording quality of our test file was fairly high (low noise, words clearly audible, enunciated clearly).

Turnaround time for machine transcriptions is about 1/3 of the media run time on average. For human corrected transcriptions, the advertised turnaround time is 3-4 business days, but the company says the average is less than 2 days. Rush human transcription option is $1.95 with a guaranteed turnaround of 2 business days and, according to the company, average delivery within a day.

Syncwords also notes edu and quantity discounts are available for all of these services, so please inquire with them if interested.

Sonix.ai

Sonix is a subscription-based service with three tiers: single-User ($11.25 per month and $6.00 per recorded hour/ $0.10/minute), Multi-User ($16.50 per user/month and $5.00 per recorded hour) , and Enterprise ($49.50 per user/month, pricing available upon request).  You can find information about the differences among the tiers here: https://sonix.ai/pricing

The videos in the folder below show the results of our testing of these two services together with the built in speech-to-text engine currently utilized by Panopto. To be fair, the service currently integrated with Panopto is free with our Panopto license, and for Panopto to license the more current technology would likely increase their and our costs. We do wonder, however, whether it is simply a matter of time before the currently state-of-the art services such as featured here become more of a commodity:

https://oit.capture.duke.edu/Panopto/Pages/Sessions/List.aspx?folderID=4bd18f0c-e33a-4ab7-b2c9-100d4b33a254

 

Rev Adds New Rush Option

Rev.com‘s captioning services have been in wide use at Duke for the last couple years in part because of their affordability (basic captioning is a flat $1.00/minute), the generally high accuracy of the captions, and the overall quality of the user experience Rev offers via its well-designed user interfaces and quality support. Quick turnaround time is another factor Duke users seem to appreciate. While the exact turnaround times Rev promises are based on file length, we’ve found that most caption files are delivered same or next day.

Rev.com

For those of you who need guaranteed rush delivery above and beyond what Rev already offers, the company just announced it now offers an option that promises files in 5 hours or less from order receipt. There is an additional charge of $1.00/minute for this service. To choose this option, simply select the “Rush My Order” option in desktop checkout.

If any of you utilize the new rush service, we’d love to hear how it goes. Additionally, if you have any other feedback about your use of Rev or other caption providers, please feel free to reach out to oit-mt-info@duke.edu.

Kaptivo

Let’s face it… humans like articulating concepts by drawing on a wall. This behavior dates back over 64,000 years with some of the first cave paintings. While we’ve improved on the concept over the years, transitioning to clay tablets, and eventually blackboards and whiteboards, the basic idea has remained the same. Why do people like chalkboard/whiteboards? Simple, it’s a system you don’t need to learn (or you learned when you were a child), you can quickly add, adjust, and erase content, it’s multi-user, it doesn’t require power, never needs a firmware or operating system update, and it lasts for years. While I’ll avoid the grand “chalkboard vs. whiteboard” debate, we can all agree that the two communication systems are nearly identical, and are very effective in teaching environments. But, as classrooms transition from traditional learning environments (one professor teaching to a small to a medium number of students in a single classroom) to distance education and active learning environments, compounded by our rapid transition to digital platforms… the whiteboard has had a difficult time making the transition. There have been many (failed) attempts at digitizing the whiteboards, just check eBay. Most failed for a few key reasons. They were expensive, they required the user to learn a new system, they didn’t interface well with other technologies… oh, and did I mention that they were expensive?

Enter Kaptivo, a “short throw” webcam based platform for capturing and sharing whiteboard content. During our testing (Panopto sample), we found that the device was capable of capturing the whiteboard image, cleaning up the image with a bit of Kaptivo processing magic, and convert the content into an HDMI friendly format. The power of Kaptivo is in its simplicity. From a faculty/staff/student perspective, you don’t need to learn anything new… just write on the wall. But, that image can now be shared with our lecture capture system or any AV system you can think of (WebEx, Skype, Facebook, YouTube, etc.). It’s also worth noting that Kaptivo is also capable of sharing the above content with their own Kaptivo software. While we didn’t specifically test this product, it looked to be an elegant solution for organizations with limited resources.

The gotchas: Every new or interesting technology has a few gotchas. First, Kaptivo currently works with whiteboards (sorry chalkboard fans). Also, there isn’t any way to daisy chain Kaptivo or “stitch” multiple Kaptivo units together for longer whiteboards (not to mention how you would share such content). Finally, the maximum whiteboard size is currently 6′ x 4′, so that’s not all that big in a classroom environment.

At the end of the day, I could see this unit working well in a number of small collaborative learning environments, flipped classrooms and active learning spaces. We received a pre-production unit, so I’m anxious to see what the final product looks like and if some of the above-mentioned limitations can be overcomed. Overall, it’s a very slick device.

AV Voice Control – A Fad or the Future?

In June of 2017, Crestron announced that their 3 series processors were capable of integrating with Amazon’s Alexa voice control. While initially viewed with a bit of skepticism, the updates and enhancements Crestron has implemented to their modules over the past ten months have made it clear that voice control isn’t going anywhere in the short term. If anything, Crestron has doubled down on voice control with the addition of Google Assistant integration in January 2018.

One of the most appealing aspects of an AV control system is that a simple button press can trigger a series of actions with a range of hardware and software. This system shields the end user from the complexities of controlling the various aspects of the AV system. While voice control has been integrated into a wide range of simple devices (lights, electrical plugs, thermostats, locks, etc.), integrating voice control with Crestron systems leverages the same advantages of the AV system control. “Alexa, turn on the AV system,” performs the same complex tasks as the button press, but can be done from anywhere within earshot of the Alexa device, and doesn’t require any understanding of the graphic user interface of the touch panel.

How it works:  

  1. The Alexa device receives your command “Alexa, turn on the lab TV”
  2. That information is sent to Amazon’s cloud, and sees “lab TV” as a smart device and sends that information to Crestron’s cloud
  3. Crestron’s cloud receives the request and sends it to the Crestron device
  4. The Crestron device receives the request and sends it to the TV, and sends a confirmation back to Crestron’s cloud
  5. Crestron’s cloud relays a “task completed” signal to Amazon’s cloud
  6. Amazon’s cloud receives the “task completed” signal and communicates with the local Alexa Dot
  7. Alexa says “OK”

What does it take to integrate voice control? First, you’ll need an Alexa device in the room, an Amazon and Crestron account, and the room’s Crestron code. By adding two voice control modules (which requires some registration/configuration on Crestron’s website) to the existing code, you can assign button presses and analog values to specific names and phrases. A quick recompile and upload and you’re off. The hard part is figuring out what and how you want to control your system.

A very special THANK YOU!!! to the Duke Digital Initiative (DDI) for purchasing the Amazon Alexa Dot which as part of their 2017-2018 Internet of Things initiative. Without their support… this testing wouldn’t be possible.

A few things to consider:

  • Safety: Some thought should be spent on ensuring that an Alexa voice command (or misinterpreted voice command) isn’t able to cause injury. This seems obvious, but from audio levels, moving projection screens, movable walls, and thermostats, it’s important to ensure the safety of end users.
  • Security Concern: Alexa is always listening (unless you mute Alexa’s mic), and is always sending data to Amazon’s cloud. There are clear security concerns about using such a system, so take that into consideration.
  • It’s still the early days of Crestron/Alexa voice control, and voice integration can break at any point if Amazon updates Alexa. If you’re considering voice control, you should have direct access to the Crestron code and a programmer or technician capable of implementing updates as needed.
  • Alexa’s voice recognition software is far from perfect and has a particularly difficult time with accents. Also, it generally wants you to talk fast, and sometimes that doesn’t work as well with AV systems.
  • Alexa currently doesn’t have any user authentication. If one person can trigger an action, all users can trigger that action.
  • Alexa is easily confused. “Alexa, set the volume to 30%” and “Alexa, set the speakers to 30%” can confuse Alexa. This contextual understanding within Alexa is improving, but still far from perfect.
  • If your Internet goes down, so does Alexa.

This is the demo we created as a proof of concept. Consider this the tip of the iceberg in what this system can do, the future is exciting.