Audio Visualization

Pedro Lasch, a faculty member in the Department of Art, Art History and Visual Studies, recently launched “Art of the MOOC: Experiments with Sound,” the latest Coursera course in his Art of the MOOC series. As the title implies, this newest offering explores audio art and ways we can creatively engage with the sounds around us. While a great topic for a course, it created an inherent production challenge. Previous courses focusing on visual and performance art provided plenty of images and video to pull from when putting the videos together. In the case of “Experiments with Sound,” many of the examples and ideas being discussed in the lectures had no direct visual representation. So, we created our own.

Research and experimentation early in the production process led to two extremely helpful features in After Effects that many may be familiar with, but I definitely wasn’t.

First, applying the “Audio Spectrum” or “Audio Waveform” effect to a layer and tying it to an audio track can create quick, simple audio visualizations. Second, audio tracks can be converted into keyframes using “Keyframe Assistant.” These keyframes can then be copied, pasted, and used to animate any attribute of any object or layer in time with the audio.

Like anything in After Effects, these features can be manipulated, combined, and complicated in any number of ways to create unique results. I found they were a good starting point, though.

This all sounds magical, and it is. However, in practice it was a bit more involved and experimental. Reaching the desired outcome inevitably involved a combination of keyframe smoothing, expressions, a lot of experimentation, and some trial and error.

We were also aided by the talent and expertise of Matthew Kenney who used Processing, open source coding software designed for the visual arts, to provide even more options to complement our work in After Effects.

Here’s a very brief sampling of experiments from throughout the production process:

AV Voice Control – A Fad or the Future?

In June of 2017, Crestron announced that their 3 series processors were capable of integrating with Amazon’s Alexa voice control. While initially viewed with a bit of skepticism, the updates and enhancements Crestron has implemented to their modules over the past ten months have made it clear that voice control isn’t going anywhere in the short term. If anything, Crestron has doubled down on voice control with the addition of Google Assistant integration in January 2018.

One of the most appealing aspects of an AV control system is that a simple button press can trigger a series of actions with a range of hardware and software. This system shields the end user from the complexities of controlling the various aspects of the AV system. While voice control has been integrated into a wide range of simple devices (lights, electrical plugs, thermostats, locks, etc.), integrating voice control with Crestron systems leverages the same advantages of the AV system control. “Alexa, turn on the AV system,” performs the same complex tasks as the button press, but can be done from anywhere within earshot of the Alexa device, and doesn’t require any understanding of the graphic user interface of the touch panel.

How it works:  

  1. The Alexa device receives your command “Alexa, turn on the lab TV”
  2. That information is sent to Amazon’s cloud, and sees “lab TV” as a smart device and sends that information to Crestron’s cloud
  3. Crestron’s cloud receives the request and sends it to the Crestron device
  4. The Crestron device receives the request and sends it to the TV, and sends a confirmation back to Crestron’s cloud
  5. Crestron’s cloud relays a “task completed” signal to Amazon’s cloud
  6. Amazon’s cloud receives the “task completed” signal and communicates with the local Alexa Dot
  7. Alexa says “OK”

What does it take to integrate voice control? First, you’ll need an Alexa device in the room, an Amazon and Crestron account, and the room’s Crestron code. By adding two voice control modules (which requires some registration/configuration on Crestron’s website) to the existing code, you can assign button presses and analog values to specific names and phrases. A quick recompile and upload and you’re off. The hard part is figuring out what and how you want to control your system.

A very special THANK YOU!!! to the Duke Digital Initiative (DDI) for purchasing the Amazon Alexa Dot which as part of their 2017-2018 Internet of Things initiative. Without their support… this testing wouldn’t be possible.

A few things to consider:

  • Safety: Some thought should be spent on ensuring that an Alexa voice command (or misinterpreted voice command) isn’t able to cause injury. This seems obvious, but from audio levels, moving projection screens, movable walls, and thermostats, it’s important to ensure the safety of end users.
  • Security Concern: Alexa is always listening (unless you mute Alexa’s mic), and is always sending data to Amazon’s cloud. There are clear security concerns about using such a system, so take that into consideration.
  • It’s still the early days of Crestron/Alexa voice control, and voice integration can break at any point if Amazon updates Alexa. If you’re considering voice control, you should have direct access to the Crestron code and a programmer or technician capable of implementing updates as needed.
  • Alexa’s voice recognition software is far from perfect and has a particularly difficult time with accents. Also, it generally wants you to talk fast, and sometimes that doesn’t work as well with AV systems.
  • Alexa currently doesn’t have any user authentication. If one person can trigger an action, all users can trigger that action.
  • Alexa is easily confused. “Alexa, set the volume to 30%” and “Alexa, set the speakers to 30%” can confuse Alexa. This contextual understanding within Alexa is improving, but still far from perfect.
  • If your Internet goes down, so does Alexa.

This is the demo we created as a proof of concept. Consider this the tip of the iceberg in what this system can do, the future is exciting.


Captions! Captions! Get your FCPX Captions Here!

As a self-proclaimed accessibility nut, offering subtitles/closed captions isn’t simply a nicety in 2018… it’s a necessity. This is particularly true now that my ears have passed their prime, perhaps due to one too many Guided by Voices concerts in my youth. Now, before we get a flood of “Adobe Premiere did it first!” I acknowledge that a similar feature has been available on that platform for some time, but whenever I dip my toe in Premiere on a quasi-annual basis… I quickly retreat to the warm embrace of Final Cut Pro.


To put this in context, I don’t shoot or edit many videos these days. But, when I do, my process for captioning is to edit the video in Final Cut Pro, export the video, upload the video to YouTube (unlisted), and let YouTube work its machine language captioning magic. Usually, within a few minutes or so, YouTube has a subtitle that’s about 80%+ accurate. From within YouTube, I then go in and manually edit the captions to achieve a near 100% accurate caption for the video. Finally, I make the video publicly viewable.

The above method is great… unless you need to re-upload the video to YouTube (or a different service) with a number of edits. Also, the longer and more complex the video becomes, the more complex managing the subtitles can become.

In a perfect world, you’d caption your footage as it is imported, either manually or sending it out to a service. This has a number of advantages, especially for larger projects. First, metadata! Searching through hours of footage for a key phrase YOU KNOW your subject said is absolutely frustrating. Wouldn’t it be better if you could search your media library for that phrase? When you caption first, this becomes possible. Second, when you make edits, the captions follow the footage. So, when you make dozens of edits… you don’t need to touch the subtitles. Very cool…

Final Cut Pro 10.4.1 is only a few days old, but it seems to be well designed and feels very Apple. Also, it wouldn’t be an Apple feature if it didn’t use a unique format called Apple iTunes Timed Text (iTT or .itt). Don’t worry, this is actually an upgrade from traditional .str caption files. With .str, you basically have the time and the world to be displayed on the screen. But, with Apple’s .itt format, you can also embed color information and location of the text on the screen. Also, .itt files import into YouTube with little trouble. If .itt just isn’t going to work, you can also select CEA-608 which is ideal for DVD or Blu-ray mastering, but .itt is the more capable format.

I’ll be keeping an eye on this feature to see if Apple eventually adds their own Siri voice to text within Final Cut Pro (perhaps FCPX 10.5?), but for now, this is a great feature for those of us that love captioning.



The Impact of Artificial Intelligence on Video

Big advances are taking place in intersection of video with AI (Artificial Intelligence). I ran across an interesting article in Streaming Media Magazine called The State of Video and AI 2018 that takes stock of some of these changes and I wanted to share it with you as we look toward what’s ahead for Duke.


We’ve been following trends in this area from a number of directions, including video captioning. As many of you are aware, the needs for captioning videos we produce at Duke are increasing, but the costs of captioning services, most of which rely on intensive manual labor, are high. However, new tools like IBM’s Watson, which includes more than 60 AI services, including machine captioning (with accuracy advertised as a whopping 96%), seem poised to shift the balance and make it possible for us to caption videos on a wider scale. We demoed Watson recently and will continue to monitor it as well as other tools in this space.

In this context I also wanted to point out that we recently began offering ASR (Automatic Speech Recognition) for Panopto, Duke’s lecture capture service. We are excited about the opportunities this new functionality will offer students and other viewers who are looking to drill down to points in videos where specific terms are found. This feature adds to Panopto’s already healthy set of features built around in-video search, including OCR (Optical Character Recognition) for slide content, and user-created time-stamped notes and bookmarks.

NewTek Connect Spark Review

The good folks at NewTek were nice enough to send Duke’s Office of Information Technology a demo unit of their new Connect Spark. The Connect Spark is a difficult device to explain to someone that has never produced or recorded a multi-camera live event. In the past, you’d need to run at least one cable from each camera to feed video and audio to the switcher before capturing and streaming that content out. Those cables had length limitations (~15 meters for HDMI and ~100 meters for SDI), not to mention being rather inconvenient. With the Connect Spark, instead of running AV cables throughout the event space, this unit leverages the local wired or wireless network to stream content either directly to a computer or an NDI capable video switcher (more on that in a bit).

Out of the box, the device is rather simple to setup and configure from the perspective of an AV professional with reasonable networking chops. I downloaded the accompanying app and was communicating with the device within minutes, able to make adjustments and confirm settings as needed. I was then able to stream that content to a local computer for importing into Telestream Wirecast (and a number of other streaming applications). Beyond that, the device could also be streamed to a Network Device Interface (NDI) capable video switcher.

In academic environments, this device could easily be deployed in large event spaces to simplify the cabling necessary to support a large multi-camera event. Also, due to the flexible/modular nature of the hardware, this same equipment could quickly be redeployed in a different location with different cameras and minimal technician involvement. Beyond that, it frees up the production team to work anywhere with a connection to the network backbone. So, in theory, you could have videographers in one building filming an event, and the director and production team in another building working their magic.

Network Device Interface (NDI)
I mentioned NDI above. NDI is a royalty-free standard developed by NewTek to enable video-compatible products to communicate, deliver, and receive broadcast quality video in a high quality, low latency manner that is frame-accurate and suitable for switching in a live production environment, or so says Wikipedia. So, this device works wonderfully with NewTek’s very popular TriCaster… but it also works with other NDI switchers such as Panasonic’s new switchers or web based virtual switches.

The Gotchas:
It wouldn’t be a DDMC article if it was all positive. Most, if not all, of the “gotchas” with the NewTek are outside of NewTek’s control. First, if you have a complicated network topology, you may experience issues. For example, in some situations, I wasn’t able to communicate with the device as it was on a different subnet or vlan. Again, not a problem if you understand your network… but for a technician that has no idea what a subnet or vlan is… it could be a show stopper. I was quickly able to quickly work around this issue, but you may need to work with your networking folks to get this all to work seamlessly. Second, if you don’t have a robust network, you may experience dropout issues, specifically when using somewhat inexpensive switches. While the device worked perfectly on our enterprise network, I experienced minor issues with my (admittedly old) home network. Infrequently, I’d see a dropped frame or hesitation. Again, I don’t blame this on the Connect Spark, but be aware that you may want to upgrade to a more modern router/switch if you are on older equipment.

Overall, I really enjoyed the device, and it underlines the coming “AV on IP” reality for AV folks.



Warpwire Workflows and Guides

Many of you by now are familiar with Warpwire’s support website since we feature their collection of video tutorials, called Guides, in the Help section of our service landing page.  Warpwire recently added a new section to their support site, called Workflows. These Workflows show how to use Warpwire from the standpoint of particular use cases, such as when an instructor wants to provide feedback to students via video, or when an instructor in a language course would like to review video or audio clips of her students practicing speaking skills.

Below are some of the new Warpwire Workflows we think you might find helpful. If there are other use cases you would like for Warpwire to consider adding, please feel free to reach out to and let us know your ideas so that we can share them with the company. And as always, if there are particular features you would like to see in Warpwire that don’t currently exist, we want to hear about those too:

For those of you who aren’t yet familiar with Warpwire’s video Guides, below is a selection of some of the tutorials we think users at Duke might find most useful, especially when they are starting out:


The Movo USB-M1 Microphone

In DIY video production, sound quality often matters just as much or more than the quality of your image. Particularly when you’re recording with just a webcam, it can be difficult to eliminate background noise and record a clear vocal track without also being conspicuous on camera. For that reason, we’ve started including the Movo USB-M1 lavaliere microphone in our DIY lecture recording kits.

The M1 is as plug and play as it gets.  Plug it in to a USB port on your computer, clip it to your shirt, and you’re ready to go. In the past, it was difficult to find good lav mics that didn’t require an audio interface or preamp like the Behringer UM2, which we previously used. Instead, the M1 matches the accessibility of the Blue Snowflake but with the increased vocal clarity that lavaliere microphones provide.

At $29.99, the M1 is a bargain. It works on both PC/Mac. See here for an audio test:


AV in a Box – The Sub $25K Classroom

As the expectations of classroom and meeting space AV changes over time, so too must the approach of delivering advanced AV systems for teaching and learning environments.

The Sanford School of Public Policy (SSPP), in collaboration with Duke Office of Information Technology (OIT) and Trinity Technology Services (TTS), was able to take a tentative list of desired outcomes for a scheduled AV update to four classrooms, and translate that into a cost-effective and robust classroom AV design. The process started with the Sanford School approaching my group (Media Technologies) at OIT and informing us that they were looking to upgrade a few classroom environments and if we could provide some general guidance to ensure they were maximizing their available funds. Based on the initial wants and needs assessment, OIT sketched a base AV design and reviewed the design with TTS to ensure the feasibility of the design and to obtain pricing. From that point, TTS finalized the design with a few minor modifications and provided pricing. Ultimately, TTS was selected as the AV integrator due to their cost-effective pricing and solid track record (roughly a 35%+ cost savings).

About the spaces:

  • Laser Projectors (5,000 lumens at 1920×1080, rated for 20,000 hours – no bulb replacements!)
  • Front and Back Cameras (no pan or tilt)
  • Built-in VoIP Calling
  • Integrated Lecture Capture (Panopto)
  • 7″ Touch Panel for Control
  • AV Bridge Standard (for WebEx, Skype, Google Hangout, YouTube, Facebook, etc.)

The system recycled the previous AV rack, speakers, and projector mount, so this was far from new construction. The Sanford School of Public Policy has indicated that they had a very smooth install, and minor issues since install ~4 months ago. So, it survived a full semester.

The pros and cons of such a system are difficult to quantify, but I’ll give it a shot.

  • significant reduction in overall cost (~35%)
  • simplified install (TTS has a robust understanding Duke’s network, VoIP systems, scheduling, etc. and it really helps)
  • good support, especially if you have tier one local support.
  • a unified graphical user interface (faculty moving from one of the 170+ TTS room to a Sanford School of Public Policy room will experience a similar user interface)
  • they understand the unique AV needs of an academic teaching environment.
  • did I mention the price?

Equally difficult would be to list the cons of using TTS. Instead of listing cons, I’ll list a few considerations when working with TTS.

  • TTS may not be an ideal fit for advanced rooms (“Advanced” is a relative term… they have done some impressively complex work and they continue to surprise, but there is a limit).
  • TTS may not be the perfect fit for new construction (Have they done new construction? Yes! Can they do all new construction? Probably not.)
  • There are limitations to their programming (TTS has a range of solid classroom designs, good programmers, and a dedication to clean design, but it’s best to “borrow” their best designs vs. reinventing the wheel.)

This was a wonderful project, and I look forward to reviewing this project in a few years to see how happy the Sanford School of Public Policy is with the overall project. Only time will tell.


Logitech DDMC Session

On November 30th, Warren Widener of Logitech visited the Technology Engagement Center on Duke’s campus to showcase three pieces of technology ideal for small and medium-sized conference rooms.

We all know Logitech for their webcams, keyboards, and mice, but over the past few years, they have expanded into small, and not so small, business environments as more organizations move toward small bring your own device (BYOD) meeting spaces. Logitech has achieved this by integrating their various devices into flexible and cost-effective offerings highlighted below. While they may be careful not to take on “the trons” of the industry, it’s clear they are looking to move up the food chain.

First, Warren provided a demonstration of the Logitech Smartdock. The Smartdock is essentially a dock for a Microsoft Surface Pro 4 with expanded I/O, designed to interface with Skype for Business and in-room Logitech hardware (cameras/mics) to simplify the process of launching an audio or video conference to the push of a button. The device is intended to live in the meeting space and act as the meeting scheduler and AV bridge. While not a perfect fit for Duke due to our deep enterprise WebEx integration, for businesses that rely on Skype for Business, this device makes one-touch video conferencing one step closer to reality.

Also highlighted at the session was the Logitech Meetup. The Meetup is an $899 MSRP wide-angle webcam, three-element mic array and tuned speakers, with build in acoustic echo cancelation, that ticks a number of boxes in small huddle room design. Unlike some of Logitech’s previous all-in-one designs, the Meetup is designed to be permanently mounted above or below a monitor and comes with a wall-mount bracket. The super-wide 120-degree field of view from the camera ensures everyone in a small conference room will be in the shot.

Finally, the session briefly touched on Logitec’s GROUP offering. We’ve seen previous iterations of this device, but Logitech promises that they continue to improve upon the overall audio quality and features from this device. Ideal for larger BYOD spaces with a pan tilt zoom camera, high-quality mics and speaker and open nature (it works with WebEx, Skype, Google Hangouts, Facebook Live, etc. etc.), the lack of integrated voice over IP (VoIP) makes it a more difficult sell in some of our more robust and demanding spaces.

Crestron DDMC Session

Today, Crestron’s Ryan Berndt visited Duke’s Technology Engagement Center (TEC) and provided a detailed overview their new NVX network-based encoder/decoder hardware, other 2017 offerings, and the ongoing benefits of the A+ Education Partner Program.

The session centered around the Crestron DM-NVX-350. This new device connects to the network over fiber or copper and can function as a transmitter or receiver (but not both at the same time). It has two HDMI inputs, one HDMI output, audio breakout, and can also transmit USB 2 signals in both directions. Crestron isn’t the first hardware manufacturer to enter this market, but their first offering is uniquely compelling considering it blends well with existing DM/HDBaseT systems. Network-based AV is a game changer, and while it won’t immediately replace traditional DM/HDBaseT systems, it will start to erode the active-learning environments and upper segments of the market, first. If you would like to be prepared for the future of AV and have a bit of a DM and networking background, Crestron offers an updated three-day training session specifically for this emerging technology (NVX Design and Networking (DM-NVX)) But, traditional AV folks may be surprised to find that the class is dominated by network conversations and considerations.

Crestron also highlighted the Crestron Mercury. The Mercury is an all-in-one device that offers “bring your own device” (BYOD) AV simplicity to small conference rooms. The Mercury offers telephone services over SIP, it has built-in mics and a speaker and has an included Logitech webcam. See a Crestron Mercury in action on Duke’s campus at

The session concluded with our Crestron representative mentioning the imminent demise of the DMPS3-4K-300-C (sad face). This system is the heart of Duke’s <$27,000 “classroom in a box” design that TTS has been installing. The system includes two cameras, an advanced audio DSP, VoIP, DukeCapture, etc. But, the good news is… Crestron will soon release the DMPS3-4K-350-C! So, it’s 50 MORE, right? But seriously, the DMPS3-4K-350-C has all the features of the DMPS3-4K-300-C, but with an additional network port and built-in AirMedia to enable wireless presentation from computers and mobile devices. Word on the street is that this feature may initially be free, but will eventually be an add-on license an AV technician can activate. Also, the price is identical to the previous version, so it’s a win/win.