The Future of AI in Video: a look forward

Iain Anderson

10 months ago

A digital brain, created by Stable Diffusion from the prompt "a digital brain imagining a future film"

Since late last year there’s been an explosion of interest in the AI space, with new tools creating images, correcting audio, and writing software for us. If you believe the most breathless comments on YouTube, we’ll all be out of a job pretty soon, but I’m confident that’s not quite how it’s going to happen. The impact of AI will be quite variable across society (say, in detecting cancer) but in many ways, the video industry doesn’t move as quickly as you might think. Because AI’s influence on wider society is less predictable, let’s narrow the scope to a razor focus on video production, and start with a quick look at what’s possible right now.

Where is the tech today?

Text-based editing, in the latest releases of Premiere Pro and DaVinci Resolve, has made waves, but it’s actually been around for years, since Intelligent Assistance released Lumberjack Builder in 2018. Yes, it’s more appealing now that AI has made transcription both higher quality and free, but rough-cutting from a transcript hasn’t been shown to be a revolution yet. It will be useful but it doesn’t make an editor obsolete.

The main areas of recent excitement have been around image generation (DALL·E, Midjourney, Stable Diffusion) and text generation (ChatGPT, an LLVM). Many new instances of the technology that these tools pioneered are coming soon, and a number of articles on this site have gone into much more detail than I’m going to here. But we now have tools to clean up audio more quickly, to synthesize usable voiceovers, and to remove objects or people from the shots they were captured in. Image generation is more useful for video-adjacent tasks like creating video thumbnails, but video generation is possible too.

In recent weeks, Runway has augmented their AI-powered video utilities with their Gen1 generative tools, and this new video-to-video transformer will be compelling for some creators. It allows you to transform your own videos based on the look of a still, but the level of realism still isn’t anywhere near what would be required for a normal, professional job.

It is, however, very good at transforming existing video into something very stylized: if your dream is to turn your video into claymation, or anime, or a moving painting, this is the tool for you.

Overall, in a context where perfect realism isn’t required, AI can succeed — especially if you’ve trained your own model to create exactly the style of work you need. Special effects are going to become easier to create too, and the farther from realism you want, the better.

How I used AI to achieve this lightning effect ⚡️:

⁃Roto player from BG, export w/ black BG
⁃Upload footage to Runway (AI)
⁃Use prompts to alter video to produce neon-line effect
-Add turbulent displace and deep glow effects
-Set blending mode to add https://t.co/1WaD1fWYYZ pic.twitter.com/IFRcgKzpzk

— Connor Henkle (@cjh_fx) April 28, 2023

Concept art is certainly something that AI can do a decent job of, and far quicker than a human can. AI-generated music isn’t as good a what a talented human can do, but it’ll do in a pinch. ChatGPT’s writing isn’t inspired, but it can spark fresh ideas or massage existing ones effectively. There’s a common theme here: AI is better at remixing than pure creativity, and is more suited to performing menial, non-creative tasks like summarizing or interpreting a client’s emailed change list (yay for Marker Toolbox!).

Here’s a great example of an AI remix to create still images, with a little animation added on:

We made a #StarWars trailer in the style of #wesanderson hope you guys enjoy it! pic.twitter.com/DP5rBxmTOI

— Curious Refuge (@CuriousRefuge) April 29, 2023

At the moment, I’d grade the creative output from most AI video tools as a B- on average — competent, but imagination and flair comes from humans. There’s plenty of poor AI content out there, but because progress is not linear, it remains to be seen if it’s going to be possible to improve that output up to reliable, repeatable A-class results. It’s got to look and sound real to be good enough, and full reality simulation is just out of reach today.

What is coming soon?

Runway has just introduced their Gen 2 update, text to video synthesis, which will of course improve. Quality is still not “real world” quality, and I don’t know that it ever will be, but it’s another step up for the pre-viz process and for creators who don’t need things to look “real”. If you need a temp clip of “dude surfing at sunset” then you can get one quickly, but it’s not photorealistic, and might never be. It’s still compelling, though, and Runway isn’t alone. Adobe’s prominent entrance into the AI space has created some waves, and their new Firefly tech is still in beta.

Here’s Adobe’s demo of Firefly’s uses for design and photo work:

While of course Adobe have used AI techniques for Content-Aware Fill and more for a long time now, modern image generation techniques promise to do that job and a whole lot more. It makes sense for Adobe to stay on top of the best “inpainting” methods, and it also makes a lot of sense to harness ChatGPT’s power to allow human-written instructions to drive software features. Runway does this too, but adding it to software that people use already will be a big win.

That key trick, letting ChatGPT (and other LLVM models) control our software for us, is where I think a lot of the potential for AI is hiding, across all industries. Imagine a super-powered Siri that knows how all your software works, and can do things you ask for in regular, human-style sentences. The vast majority of people today don’t know their software as well as an expert does, and if an AI can make complex tasks more accessible, that’s a huge win.

The danger here is that the addition of AI won’t make the whole program more accessible, but will instead enable specific gimmicks. Flashy tricks certainly inspire headlines, but then the features are overused, and then they’re of little use. While I understand the need for headlines, professionals need more than a cherry-picked set of demo files that work well — new tools have to work well on real world footage.

With that in mind, here’s Adobe’s future-looking demo of what they envision coming later this year for video:

That’s worth a breakdown:

Music generation — useful, even if it’s not as good as human-made music.
Sound effect placement — a great way to introduce new editors to the power of sound effects, but I worry that we’ll start to hear the same default sounds used too often.
Text-based overall color correction — this could be powerful, if it’s controllable, but again I would expect the same few looks to be overused in the short term.
Text-based face correction — terrific if it does a better job of automatic tracking, but again, it’ll need to be controllable.
Transcription — this is good today, and should get better once more modern tech (Whisper.ai) is integrated.
3D text styles — this is a complex effect, but feels more like a gimmick than the other features here.
Finding and placing B-roll automatically — OK, this is where my attention was piqued, but I really want to know more about the process here. I’d really like to see automatic keywording of clips, but it’s not easy to use keywords to organize footage in Premiere today. Does this AI feature just insert the first clip that matches the transcript, or does it cleverly tag all the other potential clips so that an editor can pick the best one. (This is something I’ve been wanting built-in to Final Cut Pro for some time.) We don’t need tools to make poor-quality work more quickly, we need tools to make it easier to make better work in the same amount of time.
Script-to-storyboard-to animatic — probably the most useful thing here, I can see this being spectacularly useful in all kinds of contexts. Today, I can talk to a client and collaboratively create a script with them, but then, if there’s no budget for a pre-viz, it’s up to their imagination to see the final product. A very rough version of the entire final video, that I can show them on the spot, would absolutely improve the film-making process. Combine this with the existing voice-synthesis tech and you’ve got an instant preview of a film just by writing a script, and that’s revolutionary. It’ll also make better films.

The most interesting thing about AI isn’t just new techniques like image generation, at least, not on their own. But if AI can integrate those new techniques into our existing workflows, and also harness what our existing programs can do, that’ll be a much bigger leap forward. AI has the potential to make complex tasks far simpler, and it may well shift the skillsets required to do some jobs.

What will the impact be?

New AI-based plug-ins and apps will make some jobs entirely routine, such as replacing one actor’s face with another. It seems pretty certain that anyone’s voice will be able to be synthesised too, as it’s already pretty good. Keying will be easier. Background replacement will be easier — just look at what Photoshop’s recently added in beta. Software will be more accessible. Animation and wild special effects will be easier and cheaper to create. Fewer people will learn the depths of their apps if software can find well-hidden features when they’re needed. All of that will raise client expectations, as technology already has, and we’ll produce better work.

Automatic animatics? Yes please, and wow?

And yes, some people will absolutely lose their jobs, because the lure of automatically generated animatics is far too strong. If your job is revolves around producing “temp work” intended to be replaced later, get ready to find a new gig — that’s exactly the kind of thing that AI’s going to be perfect for. Recently I read a sad note from a 3D artist who used to spend 1-2 weeks creating a 3D model for a mobile game, and now spends 1-2 days massaging the output from a generative AI instead. If “near enough is good enough” in your line of work, be ready for anything.

Despite the breakthroughs, there isn’t going to be a “make a movie” button any time soon. AI creation is best at quite limited tasks — like Elai, a new “make a video of a robotic talking head with text next to it” service — and the more you ask, the less likely that it’ll do the job well. If your job is entirely non-creative, or can be reduced to a curation of AI-powered output, you’re at risk. There’s plenty of time to step sideways into a new area, though.

What will AI still not be able to do?

While AI can perform all kinds of useful tricks, and some of those tricks can be performed very well, it will remain limited. As Tesla have discovered in their quest for a self-driving car, progress slows down the further along the path you go.

Image segmentation is another good example. This tech allows people to be separated from their backgrounds without a greenscreen, and it’s getting better in every iteration. You can see a basic version of this tech in every Zoom call where the background is replaced or blurred, but so far, it’s never been great. The Keyper plug-in, trained to find people, is good, but not quite good enough to rely on all the time. Yes, this tech will get better, but will it be good enough for professionals to throw away their greenscreens?

Keyper — an AI trained people finder that’s pretty good but not a total greenscreen replacement

It’s far, far easier to create an app that produces pretty good output, most of the time, than it is to produce great output, all of the time. I suspect a lot of the funding for advanced AI-based generative tools will dry up once the mobile phone apps have been made, wildly overused like all the previous filters have been, and fallen from favor.

Actors will still act. Writers will still write. And post-production professionals will still edit, fix audio and create visual effects, just with more help than before, and with a higher standard of output expected.

The bottom line is that a professional will still need to be able to recognize a problem to be able to ask an AI to fix the right problems. An AI that can fix obscure technical issues is of no use if you can’t spot the issues and phrase things correctly. Identifying and fixing a problem can be as big a job as fixing the problem itself, and you need a base understanding of a task to ask the right questions. For pro-quality work, humans will remain the glue between a client’s imagination and the finished product.

Conclusion

Keep your eyes open, don’t be afraid to embrace some new workflows, and if you see a wave of changing coming, outrun it by getting great, or jump sideways to avoid it. Progress will not be linear, nor will it be evenly distributed, so the best thing you can do is to keep an open mind.

Despite the inevitable changes, AI will bring some revolutionary improvements, and if we use it as a tool to assist us, it’ll make it easier to make great work. Standards are rising ever higher, so enjoy the ride up.