I’ve been testing AI powered video generation tools since they existed. Veo seemed like a quantum step forward so I’ve been working with it for the past several weeks. Many of these engines excel at creating whimsical videos but I’ve yet to find a tool that can be used to create specific videos to help support learning. Both in accuracy and consistency (trying to produce a series of similar videos in look and feel) still seems well out of reach. I’m sure my learning curve for prompt engineering for these tools is a contributing factor.
I was recently given a lesson by some folks at Duke to provide some “real world” content for me to test with. I am using this text to try to create a video animation to help visualize the learning objective of basic understanding of how Gen AI works:
Generative AI (GenAI) refers to a class of artificial intelligence models designed to create new content—such as text, images, music, code, or video—based on patterns learned from existing data.
The results have been, well consistent with my previous experience. Disappointing. It may have one or two elements that are acceptable, but then trying to keep the AI focused on what it has done well, and fixing what it hasn’t done well, seems impossible. They always want to use the brain metaphor, but I reject that as trite.
I first tried generating video in Google’s Flow tool using prompts written by ChatGPT.
Here’s my initial prompt:
The first response was pretty bland:
“Create a conceptual digital illustration showing a futuristic AI brain made of glowing circuit patterns analyzing streams of diverse data (books, photos, music notes, code, and video frames), and transforming them into creative new forms of content. The background should convey a sense of innovation and synthetic intelligence—think digital canvases or holographic outputs of the generated content. Emphasize the idea of transformation: data in, creativity out.”
After several iterations, here’s the prompt from ChatGPT:
“Create a clean, conceptual illustration showing a central machine-like device that processes data. On the left side, illustrate various data objects—text, image, audio, and video—flowing into the machine in sequence (represented by icons, symbols, or stylized data streams). From the top, depict a question mark entering the machine after the data flows in, clearly indicating it comes second in the sequence. On the right side, new content flows out of the machine—refined and glowing versions of text, images, music, and video—representing generated content. The layout should emphasize clear directionality (left → center → top → right) and a sense of transformation. Style should be modern and minimal with light motion cues or arrows to guide the eye sequentially through the steps.”
Here are the resulting videos (Flow allows you to create multiple versions at the same time):
This one got pretty close, but the “input” files and arrows are facing the wrong direction. There doesn’t seem to be away to make the tool aware of what it did on this video and have it just “do that again but fix this one thing”.
Again, sort of ok? The inputs at first are moving in the right direction but then it almost looks like it is outputting back to the input. The output then has arrows that are backwards.
I also tried using Gemini to generate the prompts thinking it might do better since they are both Google tools. Results were pretty similar. It is nice that you can generate video using Veo in Gemini but it limits you to a certain number of videos per day, where as in Flow I have not run into limitations. I’m on the paid version so I get so many tokens per month to use.
More to come. I’ll keep trying to find the secret prompt ingredient to make the tool respond, but I also think that the tools are just not ready to be as responsive as we need them to be useful in a professional production environment.
If you have any tips or tricks, please comment below!