Text-based video editing was all of the rage at NAB 2023. Two of the biggest nonlinear editing applications added text-based editing to their toolset while the plethora of online, cloud-based, and AI-assisted transcription tools continued to increase. However, I was quite surprised that several conversations with producers and editors revealed that many of them have not used cloud-generated transcripts of any kind, even though we’ve had very affordable cloud-based Transcription Tools for years.
I was wondering about it so I created a poll on Twitter.
Who has been using text-based editing in their video editing workflow at some point in the last couple of years? (As I’m make transcriptions, build edit with the text, use it or take that back to the NLE)
— Scott Simmons (@editblog) April 27, 2023
There are enough text-based video editing options these days that picking the right one for a specific workflow can take a little work. Different options might be worth considering depending on your need, the stakeholders involved, or your final delivery. But I will say this, living in an affordable transcription and text-based editing world is way better than before this was an option.
As we go through this, I will link back to several reviews I’ve done over the last few years, as I’ve been on a quest for the best text-based editing for a long time.
Defining “text-based editing” vs just transcriptions
Before we begin, let’s examine the difference between “text-based editing” and transcriptions.
Transcribing an interview or dialogue is having a cloud-based system automatically generate a written transcript from audio or video uploaded to an online service. You can usually interact with a Transcription via a web browser where you can watch, listen, read the text, and correct any potential wrong transcriptions. Often these services will interact with your NLE via a workflow extension or panel.
There once was a day before these online services, where transcriptions had to be done by humans who listened to the dialogue and then transcribed it as they went along. This may have been an online service or a transcriptionist located within your market, where you perhaps ended up with a printed three-ring binder full of time-code-based transcriptions. Having humans do the transcription took much longer and was much more expensive than having “the cloud” do it.
Text-based editing takes those transcriptions and arranges the bits and pieces of the interviews and dialogue into a coherent story. This could be by copying and pasting or highlighting, tagging and dragging sections of text or cutting in a more traditional three-point Editing style. Instead of marking IN and OUT points on a video piece, you mark your IN and OUT points on a text transcript.
You can only have text-based editing after first doing some transcribing.
How we got here
You can’t mention text-based editing without crediting the original text to video editing-based tool prEdit. Intelligent Assistance designed prEdit for Final Cut Pro Classic back in 2010. This was long before affordable cloud-based transcriptions, so it was slightly ahead of its time.
prEdit morphed into the Lumberjack Builder NLE tool. Builder NLE is a standalone Mac application (and part of a whole suite of logging tools) that transcribes and allows for text-based editing right on the desktop (and did so long before anything else mentioned here).
The bullet point list for Lumberjack Builder NLE:
- Transcripts in minutes in 16 supported languages for free; additional languages are supported at 25c per minute.
- Keyword paragraphs and identify People
- Powerful (and unique) Keyword Manager
- Comprehensive search tools to find exactly the quote you need for this Story beat
- Drag and drop paragraphs to make a story. Trim, edit and re-order until it’s perfect.
- Send to Premiere Pro CC, Resolve or Final Cut Pro X for finishing.
I reviewed Builder back in 2019, and it was quite a revelation to have the director build stories himself, and then I could import his edits back into Adobe Premiere Pro. Builder was originally Final Cut Pro X only but Premiere support was added. As lovely as it was to work this way, I think the friction of downloading, installing and learning a new application is too much for many producers and directors. You can get over that hurdle if you can convince them of the potential time and money savings.
Text-based video editing natively, built-in within the NLE
It might be hyperbole to say that text-based editing within the NLE is one of the most significant advancements in digital editing (at least as long as I have been editing), but as a creative/craft/offline editor, I believe that to be the case.
I don’t want to decouple the in-application, localized transcription feature from the text-based editing feature, as they are both equally important. Localized transcription into your editing application has been a great productivity tool. However, taking transcriptions one step further by letting the editor create an edit from the text is a revelation.
As of this writing, the implementation of text-based editing is relatively simple as it is early in the feature’s life. You don’t really need a ton of specific features to make this work. You take a transcription somewhere in the NLE, mark an IN to OUT point by scanning, reading and playing (including JKL scrubbing) the text, then use an insert or overwrite edit command to make an edit.
Adobe Premiere Pro
Adobe’s implementation of text-based editing currently resides in the Premiere beta. In Premiere, transcription has been built-in for a while now. And believe it or not, Premiere had a much more rudimentary transcription method way back in CS5. The editing right from the Text Transcript panel is the best part and the new part.
First, transcribe a source clip in the bin. Then load that clip into the Source panel monitor and that transcription will appear in the Transcript tab of the Text panel.
Load up a timeline to edit into and you can use your usual editing tools to playback, mark IN to OUT points and perform edits right from the text in the Transcript panel.
And notice in the GIF above that the edit commands in the Transcript panel support custom keyboard shortcuts, as you see with mine above. And Premiere’s text-based editing supports multicam. Transcribe a source clip and then create a Multicam Source Sequence with that clip, and you can use the transcripts with a multicam clip.
The latest Resolve beta adds a Transcribe Audio option at the bottom of the source clip submenu in a bin. Once a clip is transcribed, then, like Premiere Pro, you can edit right from a new transcription window.
One feature I really like in the Resolve Transcription window is the ability to highlight text and then apply a colored marker over that whole range.
Avid Media Composer
In all this recent talk of text-based editing, I’ve seen several mentions of Avid Media Composer’s ScriptSync. But ScriptSync isn’t really text-based editing in the same way as the other NLEs.
First, Avid can’t transcribe clips to text, so to use ScriptSync you must first import a transcript (or script) into Media Composer. You then use ScriptSync to automatically line the imported script using ScriptSync.
And ScriptSync doesn’t work like text-based editing. It’s not exactly an easy task as you have to select specific parts of the imported text file and manually associate clips with the text in a script window. Only then can you automatically use ScriptSync to associate the text with the clip.
And it works in text blocks, as you can’t mark IN to OUT points directly in the script window. You can quickly go to a specific part of a clip (and mark an IN point) by double-clicking on a text block in the script window but that’s about it. So it is not text-based editing as it works in Premiere and Resolve.
Final Cut Pro
While Final Cut Pro doesn’t have quite the same deeply integrated options of Premiere and Resolve, there are several ways to work with transcripts directly in Final Cut Pro.
The above-mentioned Lumberjack Builder is the best know and has the best integration. SpeedScriber has also long had a transcription to FCP workflow, but you’re not editing with text in a true “text-based editing” way.
— Ulti.Media (@ulti_media) March 22, 2023
And then there’s stuff like that above coming to Final Cut Pro. I don’t know what happened to Scribeomatic.
Text-based editing in the cloud
The other option for text-based editing comes courtesy of “the cloud.” Several online services offer both cloud-based transcriptions as well as the ability to do some type text-based editing in order to build a radio edit.*
What is a radio edit? That is a string out of talking heads, interviews or dialog of any kind that, while matching your script, is just a ton of jump cuts. Despite the YouTube jump-cut aesthetic, you’d never use this in a final program. It’s called a “radio edit” as you are really meant to just listen to it rather than watch it.
There are many online transcription services these days, and I have not used all of them. I’m guessing some might offer text-based editing that I am unaware of, but I think the advantages you get from an online service vs in-NLE text-based editing would be similar.
Those advantages are mainly a collaboration option with other stakeholders in the project. A director could begin to review interview footage to tag and categorize by subject matter. A producer could section off soundbites that legal might not let into a final cut. A writer can begin to assemble bites from across the entire shoot into a working script/edit.
By doing that work in the cloud, the editor can be working on other things in the program. Many a project needs more time or budget for the editor to go through all of the interviews from scratch, so someone has to be responsible for building that first paper cut.*
What is a paper cut? That’s a term from the days when transcriptions happened on paper, with humans doing the transcriptions and noting timecode. A writer would go through a three-ring binder of transcripts and build an edit on paper. Later it became a lot of copy/paste when human-based transcripts moved from paper to the word processor.
What are some of those cloud-based services that allow for text-based editing and integration with an NLE?
Reduct.video offers one of the most comprehensive sets of tools for producers, writers, directors and editors to work in the cloud. I wrote a review of Reduct and have used it on several projects.
The Reduct tools are many, and it can take some time to really understand all the tools at your disposal (hence some friction like mentioned with Builder above), but once the production embraces it, it can be beneficial.
Reduct found a way to support multi-cam clips in Adobe Premiere Pro via the Redcut panel and some clever
hack engineering, so for a while, it was the only text-based editing option in the cloud to support multi-cam. That was huge as most interviews shoot mulitcam these days. As far as I know, it still is.
Transcriptive.com and the Transcriptive Premiere Pro panel
Digital Anarchy’s Transcriptive was one of the first Premiere integrations that made in NLE transcriptions useful. I wrote a review of Transcriptive back in 2017 and it has continued to be updated since then. Rough Cutter added a new level to the “text-based editing” thing.
Descript is currently the Swiss Army Knife of all things video and transcription. It is very feature-rich, constantly evolving and even features its own video editor. PVC did a review of Descript back in 2018.
I have yet to use Descript with NLEs directly, but there is a workflow to export a Descript timeline back to Adobe Premiere Pro and Final Cut Pro. (And unofficially to Resolve).
I’ve always thought Simon Says has one of the best interfaces to work with. It’s clean and simple and powerful at the same time. Most all of the NLEs have some kind of integration with Simon Says. Simons Says Assemble is their version of text-based editing.
Sonix mentions integrations with NLEs and from their demo, it looks like it could be useful as it will let someone strike out transcription text that isn’t needed via a web browser interface and then conform that transcription back to an NLE with those strike-out sections missing, leaving a gap in the timeline.
Other transcription tools like Happy Scribe mention NLE and post-production integration on their website but from what I can tell, most of them look like support for subtitles and captioning rather than actual text-based editing.
If I am missing any services that offer text-based editing, then please let me know in the comments below.
Where do we go from here?
I hope the online services will continue to refine their transcription > text-based editing > back to the NLE workflow. I know a lot of that involves the engineering around things like XML turnovers, panel and extension design, as well as customer education and support. We can only assume that the investment some of them have made is worth it.
As for where our video editing applications go, what Adobe and Blackmagic have done in their betas is a great start. The text-based editing implementations are simple to use and make perfect sense as a first generation of the workflow. Both Adobe and Blackmagic have cloud services, so it would be great to see some way for other stakeholders to be involved in the text-based editing workflow outside of the NLE. To have that option built into the tools should be a much smoother workflow than relying on a third-party panel or XML interchanges.
I suspect that third parties will deliver quite well on their promise to implement some kind of text-based editing workflow for Final Cut Pro beyond what is available now. The third-party ecosystem always comes through for Final Cut Pro.
As for Avid and Media Composer? It’s kinda shocking that one of the most used editing tools in the interview-heavy world of documentary editing wasn’t at the forefront of in-application transcription and text-based editing. While ScriptSync and PhraseFind certainly seemed ahead of their time back when they were introduced, they currently seem a step behind text-based editing. Will we ever get anything more in Media Composer?
And what does the world of AI hold for text-based editing? Will AI be able to listen to an interview, absorb some kind of intent for the story and cough up a rough edit? I would not be shocked … what do you think?
I used to pull selects from a transcript but do actual assembly in the NLE because I wanted to see the emotion. New text-based editing tools really are best of both worlds because you can edit in text while immediately seeing the tone and delivery.
— Daniel McCullum (@dmccullum) April 27, 2023
So what is the current state of text-based editing? I would say it is strong. It’s a relatively new concept to many editors and producers and never has there been so many options to achieve an edit of video by “editing” text from transcripts.
While editors everywhere will rejoice at the prospect of not having to log hours upon hours of interviews themself, being able to take generated transcripts, read those against viewing video and then generating an NLE timeline right from working with text is a great tool to have in our arsenal. Next, we have to convince directors, producers and other stakeholders to jump in where they can and use some of this new technology to produce better stories more efficiently. Finally, we just have to make sure AI doesn’t do all the editing for us.