Overview and Motivation
Over the past several years, I have been testing AI-generated video animation as a potential tool for using video to reinforce learning objectives. The goal of this work has never been to create visually impressive or cinematic content for its own sake. Instead, the objective is narrowly focused: to generate accurate, restrained, and instructionally useful animations that support specific learning objectives defined by instructors.
While recent advances in AI video generation have produced impressive creative and cinematic results, these systems have consistently struggled when asked to generate precise, domain-constrained educational animations. Fields that require strict accuracy around structure, counts, spatial relationships, color conventions, and labeling provide a useful stress test for this limitation.
This report summarizes a recent round of testing focused on molecule construction animation, highlights current shortcomings, and documents lessons learned across multiple AI video platforms.
Instructional Goal
The instructional goal for this experiment was intentionally simple:
-
Show students the component atoms or molecular building blocks of aspirin.
-
Animate those components into the final, correct molecular structure.
-
Avoid unnecessary motion, visual noise, stylistic effects, or creative reinterpretation.
-
Reinforce molecular composition and structure, not distract from it.
In traditional animation terms, this is a straightforward keyframe animation problem—a technique that has been foundational since the earliest days of professional animation. The expectation was that modern AI systems, especially those supporting start-frame and end-frame constraints, would handle this well.
Start and End Frame Preparation
The process began with a final reference model of the aspirin molecule. Using ChatGPT, I attempted to generate a corresponding start frame that would display the correct number of atoms by type (e.g., carbon, hydrogen, oxygen), visually separated and labeled.
Despite repeated attempts, this step repeatedly failed:
-
The system consistently generated incorrect atom counts.
-
Errors persisted even when the mistakes were explicitly pointed out.
-
Additional attempts to “mark up” incorrect frames with annotations and corrective instructions did not resolve the issue.
-
Manual correction using Adobe Photoshop’s AI tools also failed to reliably remove or adjust the incorrect elements.
Switching providers resolved this specific problem. Using Gemini (Nano Banana), I was able to generate a correct start frame with the appropriate atom counts and layout.

This highlights an important practical reality: model capability and compliance can vary significantly across vendors, even for seemingly simple constraints.
Animation Testing with Luma Labs
With correct start and end frames in hand, both were loaded into Luma Labs for animation.
Initial results were mixed:
-
The system introduced extraneous visual noise, including:
-
Unprompted color changes to molecules
-
Additional molecules appearing or disappearing
-
Text moving or morphing during the animation
-
-
Through multiple iterations—primarily using negative prompts—a reasonably acceptable draft animation was eventually produced.
At this stage, the animation was generated in draft mode. Based on prior experience, the expectation was that re-running the same prompt and frames at a higher resolution (540p) would produce a similar or improved result.
That expectation was not met.
Despite using:
-
The same prompt
-
The same start and end frames
-
Both in the same contextual session and a fresh project (“board”)
The higher-resolution outputs diverged significantly and became increasingly unstable, particularly toward the end of the animation. The results were not instructionally usable.
A support ticket is currently open with Luma Labs to better understand how prompt consistency, draft mode, and resolution scaling interact, and whether deterministic or semi-deterministic workflows are achievable.
Testing with Google Flow and Veo
Additional testing was conducted using Google Flow with Veo (version 3.1).
Results here were also unsatisfactory:
-
Animations exhibited excessive motion and stylistic interpretation.
-
The system appeared to prioritize visual dynamism over structural accuracy.
-
Simplifying the prompt did not improve results and, in some cases, reduced control further.
In all cases, the resulting animations were distracting rather than reinforcing, undermining the learning objective rather than supporting it.
Key Findings
Several consistent themes emerged across platforms:
-
Constraint adherence remains weak
AI video models struggle to reliably respect numerical, spatial, and labeling constraints critical to instructional use. -
Start/end frame support is promising but immature
While theoretically aligned with keyframe animation principles, current implementations lack sufficient control and consistency. -
Resolution changes are not deterministic
Moving from draft to higher resolution can fundamentally alter outcomes, even with identical inputs. -
Simplicity is not easier for AI
Ironically, the simpler and more constrained the task, the more likely the models are to fail.
Conclusion and Next Steps
This round of testing reinforces a pattern observed over several years: AI-generated video remains poorly suited for precise academic reinforcement, particularly in contexts where accuracy matters more than visual flair.
This does not suggest that progress has stalled, nor that the technology is without promise. Rather, it highlights a mismatch between current AI video optimization targets (creative, cinematic output) and the needs of instructional design (precision, restraint, repeatability).
I will continue working with vendors to explore whether improved prompting strategies, workflow controls, or upcoming model features can close this gap. For now, AI video animation should be approached cautiously when accuracy and learning objectives are non-negotiable.




