Audio Splits, Stems & Elemental Tracks by Woody Woodhall

Additional Mixes for The Longevity of Your Project

January 20, 2015

A post sound mixer’s main task is to create an awe inspiring mix that enhances and elevates the story being told. But, post mixers’ responsibilities are far more than just creating a great soundtrack. Using sophisticated digital tools, the mixer must be sure to meet and exceed all of the technical specifications and requirements for the project. These technical specifications will be detailed in the delivery documentation for the network or distribution company. The specific delivery requirements will include the technical specifications, such as the number of channels, the optimal levels and peaks as well as how that final mix will be split into distinct elements for later use. Here’s a hypothetical example of the verbiage from a typical delivery doc of the sort of metering, levels and peaks required for a program –

Audio Splits, Stems & Elemental Tracks 7

For television in the US, the Congress has adopted standards that are specified in the CALM Act. It adheres to the ATSC A/85 doc and its very specific limitations of various audio measurements. You can read my more in-depth article regarding that here. Not all networks implement the CALM Act the same way, so adherence to their specific documentation is essential. Besides covering the technical aspects of the mixing, the delivery doc will also specifiy how to split out all of the audio elements that make up the full mix for later use.

Stem outputs, also interchangeably called split tracks or just stems, are a separation of the final mix into discrete audio elements. Generally, a full mix is stemmed or split out into various combinations of the dialog, music and effects (DME) elements from the full mix. I will go into more specific detail in a moment about specific types of stems or splits.

A quick side note regarding nomenclature for delivery – it’s all over the map. There are mix minus, DME, Effects and M & E splits and each one has audio elements and characteristics that are specific to a particular network or distribution company. Sometimes a mix minus is the final mix minus the narration track and sometimes it will exclude the music instead. Sometimes one particular stem is called the DME and other times it is called the MDE. There are as many variations on what to deliver and what it’s called as there are individual shows. This article can be a guide to understanding and anticipating audio stem delivery, but the naming conventions and actual media per stem can only be verified by the delivery documentation for each specific program.

To facilitate this in the sound mix, the final edited audio from the picture editor’s timeline must be separated and organized in a digital audio workstation (DAW). The sound editor will edit the audio to tracks that are routed to very specific places in a mix console to create the required split tracks. Dialog will only be placed on dialog tracks, music on music tracks, effects on effects tracks and any other combination necessary to separate out and send each of the elements of the final mix to create the splits. The variations on this can go deep. Here’s an example of the edited audio of a simple mix for a one-hour television special in Pro Tools –

Audio Splits, Stems & Elemental Tracks 8

Each of these tracks are routed in a mix console to deliver the final mix stem as well as the individual stems required. For broadcast audio deliveries in the US, the final full mix stem will have any (deemed) foul language beeped. There are very few hard and fast standards as to what constitutes the language that each individual broadcaster considers acceptable language and what is not acceptable. Depending on the broadcaster in the US, the variations of acceptable language is wide and deep. Many things that may have been beeped years ago are no longer censored. George Carlin’s Seven Dirty Words has become far fewer than that. A NSFW link specifying those words can be found here. Other countries may not have the same or different restrictions on language as in the US. Typically, an uncensored, dialog only split will be requested. This way the mix can be easily recreated with the split elements to make a new mix that does not contain the censor beeps for those particular territories or countries.

Sometimes there is a requirement that two dialog stems are to be created and delivered, one censored with a beep or roomtone and one that contains the foul language. These distinctions are all determined by the distribution company or the television network that is accepting the final mastered audio tracks. They each have their own internal processes for how they re-version their shows and the splits must take their particular (and sometimes – peculiar) workflow into account.

One note of caution is that these split specifications for audio on a delivery doc can be difficult to decipher, can be confusing or can frankly be contradictory. Unfortunately for us, that is our problem, and the discussions and understanding of the audio elements must be started early to be sure to fulfill the requirements prior to delivery. It is always best to get things right from the start.

If the split tracks have been improperly routed or created you might be looking at having to retrieve the master tapes back from the network or distribution company and then re-lay the new, corrected stems back to tape again. Digital deliveries are becoming more frequent for cable and network television but the process will remain the same – create new stems and recreate the digital masters with the new tracks. Things happen on every level, with every project, but be sure to read the docs thoroughly and minimize these types of problems as much as possible.

Understanding that each distribution requires separate sets of stems, what are some possible combinations that might be required? This screenshot below shows the follwong splits – a surround (5.1) full mix, a stereo full mix, a surround MDE mix, a stereo MDE mix, a stereo music only mix, a stereo effects only mix, a mono dialog only mix and a mono narrator only mix.

Audio Splits, Stems & Elemental Tracks 9

Using the Dialog, Music and Effects categories as a guide, here are some of the more typical sets of stems that will be required.

The full mix – the complete, final mixed soundtrack. This can be required in mono, stereo or various surround formats.

The following tracks may be required to be split from the full mix. These splits can be required in mono, stereo or various surround formats.

Dialog only – this will be most, if not all of the in-sync picture dialog, and also any additional dialog that drives the story forward. As mentioned this track can be censored or not depending on the delivery specifications.
Narration only – this track is a separation of the narration from the dialog tracks.
Music only – pretty self explanatory.
Effects only – this obviously contains all of the sound effect elements of the mix. But sound effects are not only sounds pulled from a commercial sound effects library. Sound effects can also be from the location recordings themselves. There are elements of the location recordings that do not contain main dialog and a sound editor will edit these recordings on to what are usually called – production effects or PFX tracks. The PFX are routed to the effects stems. PFX can be, for instance, hand slaps, footsteps, door closes and basically anything that, if it was not there, would require a sound effect to be added to the edit to fulfill that moment in the edit and the soundtrack.

Foley is another huge element of the sound effects stem. Foley is defined as the recreation of human sounds. Since microphones on sets tend to be pointing at the mouth, many other sounds that we are used to hearing are not well recorded or even recorded at all. A talented Foley artist will watch the screen and then sync their movements to picture and recreate footsteps, clothing sounds, zippers – just about anything that a human does in the typical movements of life. When not doing a Foley pass on the show pulling PFXs from the location audio are an essential element in the effects split.

Its also important to note that spoken dialog can be considered a PFX and it should be routed to be a part of one of the effects stems. Oftentimes, in the course of a program, there is additional dialog underneath a scene as well as the dialog that is driving the scene. It will often be required to split those dialog tracks in the sound edit. The story driving dialog is routed to the dialog split and the other dialog is sent instead to an effects split. These requirements are as varied as there are networks and distributors, so clarification will be required to be sure that these have been properly separated and routed to the correct splits. Additional standard split tracks include –

M & E – Music and effects – here all of the musical elements and sound effect elements have been combined into a single split, but containing no story important dialog or narration.
DME – Dialog, Music, Effects – this split is typically the full final mix with the narration removed. It makes it a simple change to swap out the narration track while maintaining the mastered mix.

These are some of the standard splits that will be required, but even here there are variations that come into play. Sometimes you are required to deliver the split out elements with the mixing and sometimes the request will be for unmixed or “undipped stems.” This is a requirement where the network or distributor wants these same elements split out however they do not want the mixing that is involved with it. What that means in practical terms is that, for instance, the music never “dips” to accommodate the dialog or narration tracks. In the International post production workflow, for some companies, this is the best solution for them to repurpose the show in another language. They will have all of the edited audio elements, timed exactly to the locked picture edit, but will have the freedom to create new dialog tracks and to then create a simple remix with the undipped elements and “dip” them to match the new dialog tracks.

Sometimes there will be a requirement to deliver both dipped and undipped stems. There is simply no way to guess what a particular requirement will be. However, it will be laid out in a delivery doc specific to the production. The main ones that I’ve discussed are a great starting point.

Often, post audio edits and mixes are needed for programs that do not have a delivery scheme in place. INDIE films, documentaries, pilots or web productions may be completed before any sales are made. In those instances I may make a range of split track suggestions to producers, that relates to the type of project it is and the deliverables that are typically associated with that sort of program. At a minimum, I would always recommend creating a set of the Dialog, Music and Effects stems to go along with the final, full mix.

Why would a filmmaker need such a variety of audio elements if it has not yet been sold? There is a wide range of reasons that a film with be repurposed and why the split tracks would be needed. I’ll list a few here.

Picture re-edit – whether recutting for time, for censorship or anything that is demanded by a sale, its much easier to recut the show with the stems to smooth out all of the new transitions and lessen the need to revisit another expensive and time consuming audio post pass on the material.
Movie trailer – its imperative to have a dialog only track to create a trailer or teaser for a show. It will be difficult, if not impossible, to create a seamless edit of the video and audio if various music tracks and sound effects are attached to the clips. By cutting the trailer with the dialog only, then it’s a simple task later, for instance, to add a single music cue underneath to match the new picture edit.
International re-language – international sales can make up a sizable chunk of additional income on a project. As discussed earlier, having correct stems will facilitate for easy re-voicing for international markets.
Music – music rights can be acquired worldwide and in perpetuity, but they can also last for only a brief time window or be available in only a certain range of countries and territories. In this case the filmmaker can use the final mastered dialog and effects splits and then add new music and create a new soundtrack in a fairly straightforward manner.
Celebrity voiceover – let’s say that a filmmaker made a wonderful documentary and the strength of the final program has inspired and enticed a celebrity to re-voice the show. Adding their voice to the program will add substantial market value as well as possible mainstream acceptance. By recording the new narration track and then mixing it with the mastered DME stem, creating a new soundtrack can be achieved with relative ease.

Producers will need a range of options with their project’s audio to be able to fulfill all of their possible markets as well as create marketing media from the source material. Once the audio has been mastered into an amazing soundtrack, being able to easily reach in and use the elements at will, significantly lessens the burden to repurpose the program. Always insist on some sort of set of audio splits to accompany your final mixes. Projects can languish for a considerable amount of time on a shelf before a sale happens. When the sale opportunity arrives you should always have a number of options to recreate or repurpose the show as is needed to make the sale.