Over the years, I have seen a lot of folklore and bad math employed to determine how to work with non-square pixels, resulting in a plethora of incorrect working practices. Therefore, in this article I'm going to spend a lot of time laying out the historical and mathematical basis for where these numbers came from. Hopefully this will provide you with a solid foundation on which you can build a new set of working practices.
PARs in CS4 = FUD
With the release of their Creative Suite 4 (aka CS4), Adobe introduced across their product line a "new" set of pixel aspect ratios for standard definition non-square-pixel media and compositions. This has caused widespread fear, uncertainty, and doubt, as many users suspect their old workflows have now flown out the window.
Relax. For one, these pixel aspect ratios are not new - they've been known for roughly a decade. They are also more correct than the ones everyone has been using to date. Finally, the changes required in workflows are nowhere near as drastic - nor affect as many users - as you might think. They just require a couple of decisions.
(If you're more of an artist than an engineer, you can skip ahead a couple of pages - although you'll miss out on some wonderfully geeky party conversation points, along with a deeper understanding of how to navigate the non square pixel minefield.)
From Analog to Digital
In the beginning, video and videotape were both analog: a series of voltages and blips that described not just the picture, but also all of the timing information necessary to determine where each field - and each horizontal line in that field - started and stopped. These include "blanking intervals" to cover when the electron beam painting the picture on an old CRT reset itself from the right edge of the screen back to the left as well as from the bottom of the screen back up to the top. If you're interested in the geeky details, National Instruments has a good description here; the illustration below is a sample from their site:
An analog video signal contains picture information (the squiggly lines) plus timing information (the pulses). Image from the National Instruments Developer Zone.
The ITU-R BT.601 specification defines how this analog signal should be encoded digitally, and became the stone tablet for how standard definition digital video would work from then forward. The first digital video decks - D1 (component video) and D2 (composite video) - followed this spec, making it easy for them to emulate an analog video deck. This is why standard definition digital video is often known as "601" or "D1" format video.
The Origins of Madness
To understand the logic behind the 601 specification, it is important to know two things about product design engineers:
- Digital circuits are easier to design if they work like gears meshing together. It's hard to design a circuit to, say, divide by 29.97; believe it or not, it's easier to design a circuit that multiplies by 30,000 and then divides by 1001 (which happens to be the true frame rate of NTSC video - not 29.97). The take-away from this: Engineers prefer to define things as ratios between whole number, rather than as decimal values.
- Whenever you can use common parts across multiple devices, you can save money. This is even a concern with professional tape decks carrying five digit price tags. So if you can find a way to use the same part in more than one device, you go for it, even if it means standing on one foot and rubbing your stomach for what may otherwise seem like no apparent reason.
These two principles heavily shaped the 601 specification. To wit, it was decided to try to use as many common numbers (and therefore, parts) between NTSC and PAL as possible. For example, both formats employ a master clock of 13.5 MHz; everything that is needed is derived from this number. Also, both NTSC and PAL "D1" formats ended up with 720 pixels (samples of the master clock) per horizontal line. So how did we get from that starting point to that ending point?
A frame of NTSC video - when you look at the entire analog signal of image plus blanking intervals - contains 525 lines per frame (not 480 or 486), running at a frame rate of 30,000/1001 frames per second. Doing the math, a 13.5 MHz master clock divided by 525 lines divided by (30,000/1001) = 858 digital samples per line of video.
Similarly, PAL video contains 625 lines, running at a frame rate of 25 fps. 13.5 MHz divided by 625 divided by 25 = 864 samples per line of video.
But doesn't D1 NTSC and PAL video have 720 samples per line, not 858 or 864? Yes - but "720" is just the image. The rest is the blanking interval and timing information between each line: namely, 138 samples in NTSC (720 + 138 = 858), and 144 samples in PAL (720 + 144 = 864). The following figure - which is straight from the 601 specification - lays it out:
Another way of visualizing the above diagram is like this:
How a video frame is stored on analog and 601-compliant digital videotape. The two squished images are the two fields of a frame (each consisting of alternating lines of the final frame); the black area is the blanking interval during which the electron beam in old-fashioned CRTs and television sets is traveling from right to left and from the bottom back to the top.
next page: deriving the pixel aspect ratios