Skip to content

Going With The Flow For TechWeek 2025

By: Stephen Toback

I came up with an idea the morning of Duke University’s TechWeek 2025 to create a looping video showcasing the Bryan Center StudiosStudio 3 (Self-Service Video Studio), Studio 4 (Podcasting Studio), Studio 5 (Video-Conferencing Studio), and the Hotel Space — for visiting students.

Our existing photos featured staff members, but for TechWeek, we wanted students in the spaces.

The challenge: we didn’t have time for a new photo shoot.


Step 1: Transforming Photos with Google Gemini

I started with the original staff photos from each space and uploaded them to Google Gemini, asking it to replace the people with students.

The results were remarkably good. Gemini kept the lighting, reflections, and spatial accuracy of the rooms while convincingly inserting new student figures.

It’s honestly impressive how naturally Gemini handled complex settings like reflections on monitors and studio lighting — everything just felt real.

The Hotel Space surprised me for a different reason. Even though I asked Gemini to remove the glare on the glass, it didn’t.
But that actually made it more impressive — in past editing sessions, it had easily removed glare, yet this time it preserved the natural reflections, which made the shot look more authentic.

Studio 5 was the only exception: since the original photo showed only the back of the person’s head, I decided not to process it through Gemini. I planned to test how well Veo could generate the face on its own later in the workflow — and it handled that surprisingly well (more on that below).


Step 2: Bringing the Scenes to Life with Veo in Flow

With the new student images ready, I opened Flow and used Veo 3’s “Frame to Video” feature.

For Studio 4 (Podcasting Studio), I gave it the simplest possible prompt — and it delivered.
It animated the students naturally, with realistic gestures and conversation flow. It even showed the people’s reflections on the TV screen.

After watching a bit longer, I noticed the TV screen was mirrored — the image reversed — but still, it was astonishing detail for a one-line prompt.


Step 3: Studio 3 – The Tricky One

Studio 3 (Self-Service Video Studio) turned out to be more challenging.

My first prompt told the person to “look into the camera,” meaning the camera in the photo — but Veo interpreted that as the filming camera. Once it made that move, it was very difficult to get it to stop turning toward the camera.

Even after refining the prompt (“look straight ahead,” “do not turn away from the monitor”), it kept the same behavior.

I also couldn’t get it to reproduce the person on the screen, something it had done perfectly in Studio 4.

Despite those limitations, the result still worked well to show the actual space, complete with a student user — without needing real student participants.


Step 4: Studio 5 and the Hotel Space

For Studio 5 (the video-conferencing studio) and the Hotel Space, I used similar prompts.

Because Studio 5’s photo wasn’t modified in Gemini, I dropped the original image straight into Flow. Veo handled it impressively — it generated a realistic face for the person on the screen automatically, based only on the context of the scene.

In Studio 5, Veo also added an unexpected but welcome cinematic touch — a slow pan across the room. I hadn’t asked for that movement at all. The only hiccup was that the person’s face on the screen didn’t perfectly match the movement of the real person’s head. It’s a small mismatch, but one of those uncanny little details that shows how close we’re getting to seamless realism.

Original File


Step 5: Script, Voice, Music and Additional Visuals

To tie everything together, I wrote the voiceover script in Gemini and generated the narration using ElevenLabs.
The tone came out clear, natural, and perfectly matched the pacing of the looping video.

People did comment that I used the “TikTok voice” and asked about the possibility of creating a “Duke Branded Voice” or even a set of official voices for campus use. I thought that was a great idea, but it would require everyone to use the same voice generation platform — since you can’t currently share custom voice models across different tools. At least, not yet.

For the intro and outro visuals, I also turned to Gemini to create still images — one to open the video and one to close it — which I then animated in Flow using Veo.

The opening image featured the Duke Chapel, and while Gemini produced a beautiful rendering, it included a cross on top of the tower.

To keep it consistent with the real chapel, I used Photoshop’s AI tools to remove the cross before animating the scene.

That small correction helped the final sequence look polished and more accurate (but not very accurate) to Duke’s campus.

The script was created in Gemini:

The music was created in minutes using Suno


Final Thoughts

This experiment showed just how powerful — and fast — today’s AI media tools have become.

In one morning, I went from still photos of staff to fully animated, student-filled videos of the Bryan Center Studios — perfect for looping at our TechWeek table.

While the models sometimes struggle with exact direction or fine control, the results were visually convincing and ready to use without any manual editing.

Takeaways:

  • Gemini can convincingly swap subjects while preserving lighting and reflections.

  • Veo 3’s Frame-to-Video can animate still photos with realistic motion and speech (sometimes).

  • ElevenLabs offers a fast, natural-sounding voiceover solution.

  • Combining tools (Gemini → Flow) can quickly produce good results.

  • Sometimes, the AI’s “mistakes” — like glare retention or added camera pans — make the final product feel more organic.


The Final Video

Here’s the completed looping video featuring Studios 3, 4, and 5, and the Hotel Space.

A few things to notice:

  • The BCS Hotel video has a time-lapse feel — I thought it looked cool, and there wasn’t time to adjust it anyway.

  • Studio 5’s clip includes that unexpected slow pan, which adds nice motion.

  • In Studio 5, watch the person on the TV screen — the head motion doesn’t quite match the person sitting in front of it.

Even with those quirks, the overall effect is remarkably good considering how the video was being used and how quickly it was put together.

Oh, and this blog post was written with the enormous help of ChatGPT 5. I fed it a word dump of my thoughts and ideas and it organized and formatted it pretty well. I edited it back and forth and as I remembered things, it patiently inserted new ideas and thoughts as they came about. Was a very cool process writing this with help.

Leave a Reply

Your email address will not be published. Required fields are marked *