Skip to content

Problem Solved! (Spoiler: Not Really)

By: Stephen Toback

OK – this is C- quality, but this is the best I’ve ever seen. Our dream of video automation is to input some text and see a video when you’re done. While we’re making great progress for certain parts of the production process, the thought of typing in some text and seeing a completed, accurate video as an output is still something we don’t think is possible given the current technologies.

I was using Google’s Notebook LM to test doing some podcasts (their discussion podcast is amazing), I noticed that there is a “Video Overview” button. So I pressed it.  Was shocked that there are no spelling errors, and the content is accurate. Some of the graphics were right on point. The highlights were strange, but still, first try, it was pretty good.

I worked with Google Gemini to find out how this thing works under the covers and address some areas where I might be able to improve. Here is the breakdown of the “levers” we have as creators and how the engine actually builds these videos.


Behind the Scenes: The Tech Stack

According to the technical data from Gemini, this feature isn’t just one model—it’s a sophisticated orchestration of several:

  • The Brain (Gemini 3): This handles the heavy lifting of “Source Grounding.” It reads your uploaded PDFs, transcripts, and notes to pull out the most important facts. It then writes a script and dictates the visual cues.

  • The Artist (Nano Banana Pro): This is the visual engine. Unlike older image generators, Nano Banana Pro is designed for high-fidelity text rendering and “context-aware” illustrations. When it builds a slide, it’s literally “thinking” through the composition to match the narration.

The Problem: Those “Strange” Highlights

You might notice some text in your generated videos has bright, almost neon highlighting. These aren’t typos or errors—they are the AI’s way of visually emphasizing what it considers the most “grounded” facts from your sources. While well-intentioned, they can be visually distracting if you want a cleaner look.

The Solution: The “Clean Visuals” Steering Prompt

Since we can’t manually edit the video once it’s rendered, the secret is in the Steering Prompt. You can find this by clicking the pencil icon on the Video Overview tile in your Studio panel.

I’ve developed a prompt specifically designed to minimize those highlights for a cleaner look. You can copy and paste this into the Custom Instructions box:

“Generate a video that prioritizes a clean, minimalist aesthetic. Please avoid using heavy background highlights or ‘glow’ effects on the text overlays. Ensure all text is presented in a clear, high-contrast font without additional graphic emphasis, focusing instead on the clarity of the diagrams and the flow of the narration.”

Your Creator “Levers”

While we can’t edit individual slides yet (the video comes out as a single MP4), we still have three primary ways to update and refine the content:

  1. Source Refinement: The video is only as good as your notes. If a word is being mispronounced or a fact is slightly off, edit your source text first, then hit regenerate.

  2. Visual Styles: Don’t stick to the default! You can choose from presets like Whiteboard, Retro Print, or Anime. If those don’t fit, use the “Custom” box to describe your preferred look (e.g., “Professional blue-and-white style”).

  3. Format Toggling: You can switch between an Explainer (a deep-dive) and a Brief (a quick summary). Sometimes the “Brief” format avoids the visual clutter that the longer videos occasionally produce.

The “Edit-by-Regeneration” Loop

The biggest takeaway from my deep dive with Gemini is that NotebookLM works best as an iterative tool. Don’t settle for the first version. Tweak your steering prompt, adjust your visual style, and let the AI “re-direct” the video until it matches your vision.

Categories: DDMC Info

Leave a Reply

Your email address will not be published. Required fields are marked *