The 2013 Hollywood Post Alliance Tech Retreat started today with a presentation from Charles Poyton on "High[er] Frame Rates". Charles couldn't attend in person, as his Canadian passport expired unnoticed, so we enjoyed him via a live webinar instead (which had all the "cone of silence" problems most webinars do!).
Charles, an imaging scientist, discussed perceptual and display-technology issues that need to be considered as a basis for understanding high frame rate (HFR) presentation. As is usually the case with my Tech Retreat coverage, what follows is a rough transcription / condensation of his remarks.
High frame rates? It's Peter Jackson's fault; "he had the balls to release a film at 48fps".
Figuring out and measuring motion perception is a tricky thing; harder than figuring out spatial resolution. Multiple pathways in the brain decode motion; "it's super-complicated". It's so complicated we shouldn't really rush into HFR imaging before we understand it better.
A slit-scan image, shown as an example of an odd motion artifact
"Is perception discrete or continuous?" This discussion has been ongoing for at least a decade. We think it takes 20-50 msec to process visual data, but we don't know if this is "frame by frame" or continuous. We know 48 fps is different, but we're not sure exactly why. It may be a little like MPEG: the only way to seriously evaluate images is to look at 'em–we can't really do it algorithmically. "And that may be all we get."
How do we see things in motion? A huge amount of the performance of HFR video is due to eye tracking (Charles discussed eye-tracking research and drew diagrams; see figure).
One of the papers Charles discussed, with "telestrator" sketches atop it.
Charles laid out the Science / Craft / Art division of color grading; then suggests replacing the "science" controls on a display with a 4-position switch for temporal rendering: Black-Frame Insertion, Interlace, etc., as on the Sony BVM-series LCDs.
Monitor controls are science, the proc amp is craft, and the grading concole is art.
The "science" controls for motion: interlace settings, dark-frame insertion, etc.
But these controls affect the picture the "craft" and "art" folks are watching, though the picture being fed the display is unchanged. "Can we be surprised that the production material looks different from the material in distribution, if we don't know how those display controls are set?" Today, the reference display is missing that standardized temporal rendering: we've finally standarized gamma, but not time rendering. Ideally we'd include metadata on temporal processing during mastering and the final display could match that, but until then, there's a danger of temporal mis-rendering. Cimena production , post, and display is all done at a professional, controlled level; HD may or may not be as well controlled (a good example of pro HD display is Mark's Met Opera broadcasts, shown in theaters). The consumer display is entirely uncontrolled.
So, Hobbit shot at 48fps, shown at 48fps. How will this translate to Blu-ray? Met Opera 'casts at 60 images/sec, shown the same way.
Sampling theory reviewed in 1D, extended to 2D (as on an image plane). Can't really sample the world with square pixels; need a weighted (Gaussian) sampling function. Digitization = sampling + quantization. Camera will typically use 12-14 bits, getting it down to 10 bits is complicated.
As Charles said: it's complicated!
Flash rate: need ~48 Hz in the cinema, ~60 Hz in the living room, ~85 Hz in the office (to fuse flicker). Depends on the duty cycle of light emission (Joyce Farrell): if short duty cycle, flicker is more noticeable; 50% less so, 100% not at all. CCFL backlights don't flicker [not necessarily so, and PWM-controlled LED backlights also may flicker; depends on driver frequency in both cases], so LCDs don't flicker.
It's better to use N bits to grayscale-modulate a pixel than to use the same # of bits to control N binary subpixels. Alvy Ray Smith: "A Pixel is not a Square! A Pixel is not a Square! A Pixel is not a Square!" A pixel is a point, not an area (2D sampling); how to translates to 3D (X, Y, Time)? "Smooth pursuit" eye movement causes simple extension of 2D sampling theory and reconstruction to temporal sampling to fail--as the eye follows moving subjects of interest. Fovea about 1 degree wide; where fine detail is seen. Outside that area, detail falls off, but the eye can move to another area to see detail.
An experiment involved eye-tracking a subject reading text on a 24x80 character screen. A degree away from the gaze point, software changed all the characters to Xes, and the subject couldn't detect the change! But of course we can't fit all our image observers with eye-trackers, so we need to maintain detail through the scene.
Saccades (rapid eye jumps from fixation to fixation) at 4-8 Hz. Dwell on gaze points around 100msec. Microsaccades / tremor (beyond the scope of today's talk). The key for motion imaging is that light hitting the fovea is integrated along the smooth pursuit track.
Question: A bird flies past a tree. If in frame 1 it's entirely to one side of the tree, and in frame 2 it's entirely to the other side, did it fly in front of or behind the tree?
CMOS sensors: 3 transistors per photosite gives rolling shutter. A global shutter requires an extra transistor. Thus 33% increase in complexity to get a global shutter.
A 1908 photo taken with a focal-plane shutter (same artifact as a rolling shutter).
Backwards-turning wagon wheel due to insufficient temporal sampling. Poynton's 1996 paper on "Motion Portrayal, Eye Tracking, and Emerging Display Technology". Triggered by looking at early DLP chips and how they made images. Charles showed a diagram of 24fps film being interlaced-scanned with 3:2 pulldown; this leads to judder (due to some frames being "50% longer" than others), which is better than spatial disturbances within the frame (if 3:2 pulldown weren't used, and the frame change happened without regard to the video scanning). Another example: dot-matrix LED signs are typically row-sequentially illuminated; if the text on the sign is moved laterally (crawled), you'll see a slant in the text as your eye tracks the text moving across the sign.
Next: How display technologies affect motion presentation...