Skip to content

AI Colorization Rabbit Hole

By: Stephen Toback

What began as a simple experiment turned into a full benchmarking session.

I found a sepia tone photograph of my father-in-law from the 1950s. I wanted to see what it might look like in color, so I started uploading it into different AI image tools using the same basic prompt:

“Colorize this photo.”

No style guidance. No era notes. No complicated instructions. Just that.

In a few cases I added the word “accurately,” but only to see whether it would meaningfully change anything.

What followed was a surprisingly wide range of results. Some are general AI models and some are web apps that are tuned specifically to colorization. While some did well to great, some tools barely functioned. Some produced completely new images. A few were technically impressive but aesthetically off. And two stood out in very different ways.

Here is what I observed.


Gemini (Nano Banana)

Gemini’s result was the most obviously artificial.

The colors did not feel grounded in reality. The tie was a bright, almost electric blue. The face carried a blush that looked airbrushed rather than naturally lit. The overall effect felt less like a restored photograph and more like a manual colorization from decades ago, when artists would hand-tint prints with strong pigments.

There was no subtlety in tonal balance. Skin did not interact convincingly with shadow. The result looked applied rather than integrated.

I did not use the “accurately” prompt with Gemini, but based on the baseline result, it is hard to imagine that wording alone would have resolved the underlying aesthetic problem. It simply did not feel like a photograph that had once existed in color.


ChatGPT 5.2

I was stunned how great ChatGPT came out with the simple prompt. Doing so much better than Gemini as my second try sent me through the rabbit hole to look and see what else was available. This may be the best balance of style, accuracy and creativity.


Adobe Firefly 5

Colorize This Photo
Colorize This Photo Accurately

Firefly 5 barely engaged with the task.

It colorized the tie and left everything else untouched. The image remained essentially black and white. When I added “accurately,” it removed the sepia tone and converted the image to a more neutral black-and-white rendering, but still introduced no meaningful color.

In other words, it technically responded to the prompt, but it did not truly perform colorization.

Just a note, at this point, I don’t think Firefly is available through Duke’s Creative Suite at this time. I tested using my personal Adobe license. 


Flux 2 Pro (inside Firefly)

Colorize This Photo
Colorize This Photo Accurately

Flux 2 Pro (inside Firefly) did colorize the image, but the aesthetic leaned heavily toward a 1970s airbrush style. Skin looked overly smoothed and slightly cosmetic. The tones were strong and somewhat theatrical.

When I added “accurately,” the facial coloration toned down slightly, but the overall look remained overdone. It felt more like stylized enhancement than restoration.


Firefly Image 4

This one surprised me in a different way.

Instead of colorizing the uploaded photograph, it generated an entirely new image. There was no clear image-based editing prompt interface, and the result was not a colorized version of my father-in-law at all. It was simply a new interpretation.

At that point, it became less about colorization and more about image generation.


Runway Gen-4

Runway Gen-4 (in Firefly) produced essentially the same result as Firefly Image 4: a newly generated image rather than a true colorized restoration of the original.

This highlighted an important distinction between systems designed for generative synthesis and those designed for precise image editing. Colorization requires restraint and continuity, not invention.


Imogen 4

Imogen 4 (in Google Flow) smoothed the photograph and altered the subject’s features slightly, but it did not meaningfully colorize the image.

The smoothing changed the character of the photograph without fulfilling the task. It felt like a beautification filter applied to a black-and-white image rather than a restoration.


Krea Enhance – 100% AI

Krea – 100% AI

Krea Enhance at 100% AI did not colorize the image at all. Instead, it transformed the character of the subject in interesting ways. The result was creative and somewhat compelling, but it was not restoration.

It felt more like reinterpretation.


Krea Enhance – 50% AI

Krea – 50% AI + Other Settings

 

With adjustments and reduced AI influence, Krea Enhance performed slightly differently, but it still did not meaningfully solve the colorization problem. I have a screenshot of that attempt, and it shows incremental change, not transformation.

It was intriguing, but not what I was aiming for.


Krea Edit

Krea Edit was a completely different experience.

The result was striking. The color felt integrated rather than applied. Skin tones were balanced. The lighting structure of the original photograph was preserved. The image retained depth without becoming overly modern.

It did not look like someone painted color onto a black-and-white base. It looked like a color photograph that had always existed and had simply been restored.

There was subtle warmth, believable fabric tones, and dimensionality that felt photographic rather than cosmetic.

This was one of the clear standouts. It could also be viewed as non-historic and “overhyped”.


MyHeritage (DeOldify)

MyHeritage uses the open-source DeOldify model, and its aesthetic reflected the historical lineage of early neural colorization.

The result resembled hand-tinted airbrush photography. The pink coat choice was particularly striking. It may well reflect how early automated colorizers behaved, choosing bold but plausible pigments based on limited contextual cues.

The result was charming, but stylized. It leaned more toward artistic reinterpretation than photographic realism.


Palette.fm

Palette.fm is explicitly built for this task, and that focus shows.

The color balance was thoughtful and restrained. Skin tones felt grounded. Unlike Krea Edit, however, it preserved an “old photograph” sensibility. The image remained recognizably mid-century in tone and mood.

It did not modernize the photograph. It respected its age.

That restraint is admirable, especially for historical restoration work.


Seedream 4.5 (Seedance)

Seedream 4.5 from Seedance AI may have been the most historically convincing result alongside Krea Edit.

The colorization extended into the background, which added cohesion to the image. The tonal balance felt natural rather than cosmetic. Skin tones were believable, and the environment gained subtle dimension.

If Krea Edit felt cinematic and beautifully restored, Seedream 4.5 felt historically grounded and complete.

Choosing between them is subjective, but Seedream may have the edge for those prioritizing historical sensibility over aesthetic polish.


What This Revealed

Using the exact same prompt across all systems exposed something important: these tools are not interchangeable.

Some are built for generation.
Some are built for enhancement.
Some are built for narrow, specialized restoration.

Colorization is not simply about adding pigment. It is about understanding light, shadow, texture, film behavior, and the emotional weight of a human face.

When the result feels wrong, it breaks the illusion immediately. When it feels right, the person in the image suddenly seems closer in time.

What began as a casual experiment turned into a reminder that even in 2026, specialization still matters. The model behind the interface makes all the difference.

And sometimes, bringing the past into color reveals just as much about our tools as it does about history.

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *