AI Tools: Generative AI for Video & Animation Updates by Jeff Foster

Support ProVideo Coalition

Shop with

Filmtools

Filmmakers go-to destination for pre-production, production & post production equipment!

Shop Now

A quick look at the latest developments in Generative AI for creating videos & animations

Jeff Foster

May 28, 2025

Comment

If you’re on social media much these days, you’ve undoubtedly seen a major uptick in examples of some pretty amazing AI Generated videos coming out, and how creatives are making fun projects and even commercials and short films with these tools. It’s really an exciting time for content creators and producers to engage and help shape the way we utilize these new tools and production workflows.

AI Tools: Generative AI for Video & Animation Updates 38 — AI Generated “small cheering crowd against green screen”

So now let’s take a quick look at an updated feature I’ll be sharing from HeyGen’s new Avatar IV, an AI Image to Video tool, using my own headshot photo and cloned voice:

Sure, it still has a bit of exaggeration around the mouth movements and body, but it’s still a huge improvement over previous versions and most all other Image to Video explainer video tools. See more about this technology later in the article.

HeyGen Video Avatars

If you’ve been following my articles over the past couple years, you’d know that I’ve been covering the advancements that HeyGen has been developing with their AI Video Avatars capabilities.

I created this first test with two different angles against a green screen to create two different “Instant Avatars” in HeyGen (I didn’t pay extra for the more detailed pro versions). I used ElevenLabs to produce the voice audio file and used that in both camera angles and then composited them each in Adobe After Effects separately and rendering the results as two separate files. I then edited them together in Adobe Premiere Pro like I would regularly recorded video footage. I purposely left dissolves in so you can see how the two angles are synced up with the voice audio track.

First pass is the raw green screen video clips that came straight out of HeyGen, and then a quick composite in After Effects with the faux “TEDx” stage backgrounds.

HeyGen also offers tons of Stock Avatars you can use, including some with pre-generated expressions built-in and some with multiple camera angles.

For this test, I selected an “Expressive” avatar, which I used a voice audio track created in ElevenLabs, as the default voices offered in HeyGen weren’t great for this model.

I must say that the resulting video is quite believable. I only hope that Custom Avatars can eventually be generated this believably in the near future.

For some more examples of HeyGen’s Stock Avatars with 2-cam edits. I ran the text through ElevenLabs AI to get the voiceovers, then put them into the project for both camera views. This gives you quality control of the audio and assures it’s perfectly synched when you edit. That’s why I put all the crazy B-cam fades in these examples to show of the synchronization from each cam and the audio voice tracks.

Some stock avatar models are more believable than others, as you’ll see in these examples. You have to choose what’s right for your productions of course, as some are pretty stiff in their delivery and some are overly-expressive and have a harder time aligning with the voice content, such as the last model in this video.

HeyGen Avatar IV

HeyGen has been developing it’s AI Image to Video capabilities to something really quite remarkable and useful.

Using only one of their stock images, I tried a quick test with an audio script I created in ElevenLabs and imported it with the tool. I was quite blow-away with the results!

Selecting the stock image from HeyGen’s library options:

Uploading the recorded AI voice audio from ElevenLabs:

Here’s the resulting video:

I can totally see this as a great way to personalize service and sales messaging without the need to record video for the productions or to clone for a Video Avatar. Other than the exaggerated mouth movements (including mine at the top of the article), I think this comes off even more realistic than the Video Avatars.

I’m still waiting for more options such as expression and energy cues, but I’m sure that’s only a matter of time.

For my next Avatar IV tests, I first created four different images in Midjourney, of experienced businesswomen from various ethnicities.

Using the same AI generated script text, I input it into ElevenLabs and found 4 different voices that I felt matched each AI generated image of a businesswoman.

I then imported the images and the audio files from ElevenLabs into HeyGen’s new image to video tool, Avatar IV. As you can tell, each performer is reading the same script. The details around the hair and backgrounds are what really strike me, along with body motion and breathing. But as I mentioned in my opening video, the mouth movements are still a bit over-exaggerated.

I find this really amazing technology, going from a completely AI generated subject, to an AI script, to AI generated voice track to AI generated video from these elements. All within several minutes.

Image to Video Tools

There are several AI Image to Video tools emerging and competing for a niche market segment, but some are also quickly surpassing many AI Video to Video tools in the results. Even some of the Text to Video tools I’ve seen some remarkable results (see the bottom “car show” video toward the end of this article).

In this test, I used Midjourney to generate a starting image of a small group of people against a green screen. Sadly, this was the best result I could get from various attempts, but it was something I could work with in Photoshop to clean it up.

I took the image into Photoshop and first did a color correction pass on the subjects to provide some details to generate a matte.

Using Photoshop’s Extract Object tool, I was able to select the background and even out the color for a solid green that would work for extraction of the animated subjects if all works as expected.

The final cleaned image I used as a source for all the following examples:

A couple of variations on the text prompts attempted with source image in each AI tool:

“camera slowly moves around in an arc around the left side of the group that’s cheering excitedly!”
“camera slowly tracks in an arc around the left side of the group cheering excitedly!”

As you can see in a few screenshots here, what the prompt was and how it handled it.

Runway Gen4

OA Veo2

OA Wan2

Adobe Firefly

Using the same simple text prompts and the same source image file, I tested several of the top “Image to Video” tools and got less than desirable results.

Some of these results are just hilarious! People just standing there, crazy color changes, additional people show up off camera and the spinning… OH MY GOD THE SPINNING!! 😛

Most tools were just frustrating though. Using simple prompts for camera moves just doesn’t seem to work yet. Most shots did some kind of zoom or truck/pan off while other just had people jumping or flapping their wings. No matter how I tweaked the prompt, it usually just got worse. This would have been a really frustrating process, had I needed it to work for a project, but of course, I’d be attempting to use more sophisticated prompting in such a case. This was really just a simple comparison test between the available tools.

Test examples from Sora, Kling, Krea, Runway Gen4, Hailuo, OpenArt, Vidu Q1, VEO 3, Vidfly and Adobe Firefly.

Note that what I was looking for was the camera move – to track an arc around the small group of people slowly. I wasn’t as interested in what the people were doing, although the results are humorous in any case. The absurdity of some of the video clips… there was one clear winner in this test, even though it tracked the wrong direction – I’ll take it.

So this just leads to one of my next articles in the series: Prompting Techniques. We’ll be learning together just what works and doesn’t between the tools.

Speaking of prompting, this amazing AI video created entirely from text prompting (video and audio clips) in Veo3 by artist László Gaál of a non-existent car show that has been making waves in the industry this week. It shows just how close we’re getting to complete Text to Video productions.

Description from YouTube post: “Before you ask: yes, everything is AI here. The video and sound both coming from a single text prompt per clip using #Veo3 by Google DeepMind and then these clips are edited together. Whoever is cooking the model, let him cook! Congrats to Matthieu Lorrain and the team for the Google I/O live stream and the new Veo site!”

I’ll be exploring more with prompt engineering and digging into Text to Video tools and workflows in my future articles. They really go hand in hand for the best results.

As we explore these tools further, I’ll continue to highlight some of the best productions that I find and am always welcoming your input and insight on the technology and the impact on the film & video industry.

Stay tuned…