After Effects & Performance. Part 11: The rise of the GPGPU by Chris Zwar

How the humble graphics card became the focus of performance

April 13, 2020

The GPU has come a long way from the days when computers could only display monochrome text. Over the past decade, the GPU has been the main focus of software performance improvements – not just for 3D games, but for other graphics software and even everyday apps. Because 3D apps have seen incredible progress in rendering speed and sophistication over the past ten years, it’s understandable that After Effects users are wondering if the same benefits are coming to AE. The short answer is… it’s complicated, hence the in-depth look at where the GPU came from, and where it’s going.

pvc_aepeformance_geforce256_small — It’s there on the box: the words “Graphics Processing Unit”.

Part 10 was an epic overview of how the GPU came to be – from a time when computers plugged into your home TV, to a bunch of guys who left Silicon Graphics to bring 3D graphics to home computers, before being bought out by a little company called nVidia. If you missed part 10 then you missed a great story, and here we’re picking up where we left off: it’s 1998, and nVidia have launched their latest graphics card, the GeForce 256.

When nVidia launched the GeForce 256 in 1998, and introduced the term “GPU” to the tech world, it wasn’t immediately obvious that it was anything special. Looking back, reviews from 1998 suggest that if anything, the GeForce 256 was slightly underwhelming – especially considering its price. But despite the muted reception it received at launch, the GeForce 256 represents a significant part of modern computing history, and not just because the marketing department lifted the term “GPU” from an old Sony document and used it as an advertising slogan.

In Part 10, we looked at a historical overview of various graphics milestones up to 1998. Part 10 concluded with the launch of the GeForce 256, which touted hardware accelerated texture and lighting (T&L). As noted in part 10, this new feature wasn’t that useful when the GeForce 256 was first launched, because no existing games supported hardware T&L, hence the relatively lukewarm reviews.

But this new feature was the first step in a completely new direction for graphics cards, and set in motion the technical developments that have lead to today’s monster GPUs. To see the significance of hardware T&L we need to look at why this was an important new development.

Fixed Function Hardware

Once Quake was released in 1996, 3D acceleration on graphics cards took off. At this stage the term “GPU” wasn’t in use, and video cards were still just video cards. Some were only 2D, the Voodoo cards were only 3D, but most 2D video cards began to include some form of 3D acceleration. While the Voodoo cards weren’t the first PC cards to accelerate 3D, their superior performance with Quake, and dedication to 3D only, initially saw them become clear market leaders.

These early, first generation 3D graphics cards worked by taking common, well-known algorithms used in 3D rendering and designing custom chips that were dedicated to those algorithms. As detailed in part 5 – if a mathematical function can be performed on a custom chip, instead of using a general purpose CPU, then the result can be much faster – orders of magnitude faster. The various companies designing 3D graphics cards faced the challenge of making their products as powerful as possible with the limited resources available to them, including hardware restrictions such as the amount of RAM on the board and even the number of transistors in the chips themselves.

The exact capabilities of each video card varied from company to company, but the features were locked once the card was manufactured. This era is often referred to as “fixed function hardware”, because the functionality of each graphics card was literally burned into the chips, and fixed forever. As a rough analogy, these earlier graphics cards were like a desktop calculator with buttons for each function. The calculator is fine as long as there’s a button for the function you want, such as add, subtract and so on. If you want a function that it doesn’t have, such as square root, then you’re out of luck. You either calculate the square root using combinations of the tools you do have, which is much slower, or you just look around for another, presumably more expensive calculator that has it built in.

3D graphics acceleration initially worked along these same lines. The 3D chips were capable of specific functions, and games worked by using the functions available to them. If a game needed a specific function that the 3D card didn’t have, then it would use the CPU instead – which was slower. The more functions that a 3D card could perform without using the CPU, the more efficient the 3D rendering, and the higher the resulting frame rate.

From the release of Quake, the software used to create 3D images and the video cards used to accelerate the rendering advanced in conjunction with each other. The goal was always to improve the performance of the games people were playing – such as Quake. The primary motivation behind 3D acceleration was improving the frame rates of the 3D games they were used for.

But not all 3D is the same.

Metaballs bigger than King Kong

These days, 3D computer animation is so prevalent that we don’t think too much about how it works. Nearly all modern computer games are based on 3D graphics, and most digital visual fx in movies include 3D modeling and rendering. While the history of 3D graphics is well documented, there are a couple of points worth considering so when we get back to After Effects we can see where it fits in.

Firstly it’s important to note that there isn’t one single “best” way to do 3D graphics. The history of computer graphics is a story of many different approaches to the same problems, some of which are just historical footnotes while others have become fundamental tools in modern workflows.

A good example is to look at how many different approaches there are to 3D modeling. In order for a computer to render 3D images, we need to find a way to represent objects in a way that a computer can understand them. Most modern solutions to this problem use polygons – but even here some software uses triangles, while others use quads. This might seem like a minor difference, but it’s a useful example that there’s not one single, universal approach to 3D graphics.

pvc_aeperformance_utahteapot — The most famous 3D object in history, the Utah Teapot, was originally modeled as a set of 32 bezier curves (centre) defined by 306 vertices (left). But modern 3D software models objects using polygons, usually triangles (right) but sometimes with quads. There’s no single, universal “best” approach.

In addition to representing 3D objects as a collection of polygons, there are other ways to do it as well. The iconic “Utah teapot” was originally created in 1975 as a set of Bezier curves, and even modern 3D software needs to convert the original data to polygons before it can be rendered. Objects can be modeled using curves, including splines, or even built from a combination of pre-defined primitive shapes. The 1990s saw the emergence of “metaballs”, which I’ve never seen anyone use but they’re good to keep in mind because they sound a bit rude.

While researchers developing 3D graphics for film and television were mainly concerned with how objects looked, engineers needed to model how 3D objects behaved, and so they came up with completely different approaches again.

Just as there were many different approaches to modeling 3D objects, there were many different methods being developed to render 3D scenes. One particular algorithm that can be used is called “radiosity”, which has always fascinated me because it was developed before computers were advanced enough to use it. Engineers in the 1950s, working in the field of thermal dynamics, had come up with a set of algorithms that modeled heat transfer between objects. The researchers were aware that light behaved in exactly the same way as heat (light can be a form of heat), and that the same formula could be used to model how coloured diffuse light interacted between objects. However computers in the 1950s weren’t powerful enough for 3D rendering, and even then there weren’t full-colour frame buffers until the 1970s, so you couldn’t have easily seen results anyway. But by 1984 the technology had caught up, and the original thermal dynamics algorithms were adapted for computer graphics and called “radiosity rendering”. Because radiosity only models diffuse light, the resulting images are often very atmospheric and moody – another reason I’ve always loved radiosity renders. These days, radiosity is one of several algorithms used to render “global illumination”, which has become the more common term.

Radiosity rendering produced beautiful looking diffuse renders, but it didn’t render things like reflections, refractions, specular highlights and other lighting characteristics. These were handled by a different mathematical approach that is generally called “ray-tracing”. As ray-tracing algorithms improved, the combination of ray-tracing with global illumination was able to produce physically accurate, photorealistic 3D renders.

pvc_aeperformance_renderers — There are many algorithms that render 3D images. The radiosity algorithm (left) models the way diffuse light interacts between objects. Ray-tracing (centre) calculates reflections, refractions and specular light sources. Combined with a global illumination algorithm like radiosity, the result can be a physically accurate photorealistic render. But other approaches exist too. Contrary to popular belief, Pixar’s early version of their “Renderman” algorithm (right) did not use ray-tracing, it was a scanline renderer called “Reyes”. Computer games use an entirely different approach to 3D rendering again.

Even here, there’s a market for different approaches and different commercial products. Ray-tracing is very slow to render, so other solutions were developed as well. Early 3D experiments by Pixar – before they were even called Pixar – used a “scanline” rendering algorithm called “Reyes”, which evolved into the “Renderman” engine. Other commercially available 3D renderers used today include Vray and Arnold, with room for more niche products such as Corona. Disney Animation – a separate entity to Pixar – developed their own renderer called “Hyperion”, while more recently Framestore developed their own, in-house renderer called “freak“.

These alternatives exist because they’re different – there’s no one, perfect 3D product. Pixar’s Renderman, for example, was a pioneering 3D rendering solution developed in the 1980s, but contrary to popular belief it didn’t actually use ray-tracing algorithms until version 11.

What about After Effects?

pvc_aeperformance_part2 — In Part 2 of this series, we looked at what After Effects actually does. Shuffling 2D bitmap images around is totally different to rendering 3D.

The point of all this is to emphasise that there has never been one, single approach to 3D graphics. On a broad level, the fields of CAD and engineering have always had completely different priorities to those of animation and visual fx. On a closer level, there are several different 3D rendering algorithms available as commercial products, with enough differences between them to cater to specific markets.

As we looked at early on in Part 2, After Effects is not a 3D package, and it works in a fundamentally different way to 3D renderers. Because GPUs have dominated graphics developments over the past 10 years, it was important to establish exactly what After Effects is doing – and how it differs from 3D animation packages. That’s why an entire article was devoted to explaining the difference.

However there’s no escaping the fact that GPUs have been at the forefront of recent graphics development, so we want to track the advances over time, to see where After Effects can fit in.

For many years it simply didn’t. After Effects was 2D, and 3D was 3D. 2D is simple – you’re just copying blocks of memory around.

But the point – so far – is that 3D is not simple, and that not all 3D is the same.

Game on

In Part 10, when we looked at the origin of the GPU, we saw how different the world of high-end supercomputers was to the low-cost desktop market. The modern GPU was originally designed to accelerate games on home computers, catering to a new market that had almost entirely been created by iD software.

The first major releases from iD, Wolfenstein and Doom, were not true 3D games – objects and scenes were not created from 3D geometry, but rather clever skewing of 2D textures. But Quake was built on a groundbreaking new gaming engine that did use 3D geometry. The success of the Playstation 1, and the crude 3D graphics used in smash hits like Crash Bandicoot, proved that the general public were more concerned with the speed and playability of games, and not the technical quality of the image.

The problem with rendering high quality 3D images was speed. The traditional rendering techniques that had been developed were very slow, even if the technology existed for photorealistic rendering. By the time Quake was released in 1996, Hollywood had seen great success with 3D visual fx in films such as The Abyss, Terminator 2, Jurassic Park and Toy Story. However the computing power needed to produce these landmark visual fx films was just as notable. A 1994 Wired article on Jurassic Park includes the line “The rooms hold $15 million worth of networked SGI CPUs; that’s nearly 100 computers”. A SIGGRAPH talk from 1994 notes that the T-rex scene took about 6 hours to render a single frame. Toy Story, released about 6 months before Quake, had scenes which took up to 15 hours to render a single frame.

It’s obvious that taking 15 hours to render a single frame is pretty useless for games, where the goal is to get frame rates as high as possible, with the minimum acceptable frame rate for modern games generally considered to be around 30 fps. But 30 frames per second means that each frame has to render in 1/30 second, or .033 seconds, and that’s a pretty big step down from 15 hours.

The solution, if it isn’t obvious, is that 3D games are rendered in a completely different way to commercial, photorealistic 3D renderers. It’s not as simple as saying that a gaming render engine like Unreal or Unity is faster than a photo-real renderer like Vray or Arnold. Instead, the entire approach to creating 3D images is different, with the emphasis being on speed rather than image quality.

Beginning with Quake, the underlying approach that 3D gaming engines used was established, often referred to as Z-buffering. A number of articles have appeared over the years that dissect exactly what happens when a single frame of a 3D game is rendered, offering a detailed insight into the steps involved. Adrian Courrèges has a great breakdown of the rendering process behind GTA (on the Playstation), and that inspired Kostas Anagnostou to write a similar analysis of the Unreal engine.

But it’s enough to know that the algorithms used by real-time gaming engines are fundamentally different to those used by 3D packages such as VRay, Arnold and so on. In addition to using highly optimized algorithms, the video cards need all of the information about the scene to be loaded into memory first, where it can be accessed directly by the chips on the graphics card. This immediately presents games developers with their first technical hurdle – all of the assets required to render the scene must be able to fit into the amount of RAM that the average GPU has.

Regardless of exactly how they work, one thing is clear: rendering 3D games is all about rendering fast.

Freedom shades

Many of the functions that early “fixed function” 3D cards provided were established mathematical algorithms. For example the first stage in rendering a 3D image involves taking the separate 3D objects in a scene and combining their geometry into one “world”. Then, all vertices are sorted by their distance from the camera, and anything not visible is culled from the scene. This is a technically straightforward process and there’s no need for any creative input. In other words, if you need to measure how far apart two objects are, then there’s only one answer and it doesn’t involve an art director.

Jumping forwards past several more equally technical processes, and we get to a stage called “shading”. This is where the pixels on the screen get their final colour, which includes processing textures and lighting. The shading process, unlike the purely technical process of sorting out where geometry is in 3D space, does have the potential for creative variations to influence the output. However for the first few years after Quake, the “fixed function hardware” video cards either didn’t accelerate shading at all, or had a single shading algorithm baked into the chips.

The GeForce 256 was the first video card with a notable new feature – it opened up the shading process to games developers. The shading process could now include short pieces of software code that would be executed by the graphics card. Initially, these snippets of code were limited to about 10 instructions and were solely focused on the textures and lighting. However the fact that software – no matter how tiny – was being run on a video card signaled a huge change in how video cards were being utilized.

nVidia marketed this new feature by calling the GeForce 256 a “GPU” – a graphics processing unit. Graphics cards were no longer like a desktop calculator, limited to the functions / buttons it came with when you bought it. The video card could now be programmed directly by games developers to suit their specific demands.

The GeForce 256 signaled the end of fixed function hardware.

Add another GP to your GPU

As mentioned before, at the time of launch this new feature didn’t really make an impact, because initially there weren’t any games that had been written to take advantage of this feature. But – as is still the case today – every new graphics card that was released was more powerful and sophisticated than the previous model, and games developers quickly adopted new techniques to push the boundaries of what was possible. Hardware accelerated texture and lighting took off. The market for 3D games boomed.

The ongoing development and progress of GPUs was still driven by games, and in the few years following the demise of 3dfx (bought by nVidia), the market settled down to the two major GPU developers we have today: Ati and nVidia.

For the next few years, Ati & nVidia continued their race against each other to create faster and more powerful GPUs, primarily driven by the market for 3D games. However as the capabilities of each new GPU increased, they also attracted interest from the scientific community.

A number of researchers saw the potential to harness the relatively cheap and powerful abilities of the latest GPUs for other types of processing. The same types of algorithms that were used to render textures and lighting were also applicable in completely unrelated fields. Thanks to the market for 3D games, GPUs had become commodity hardware and this was reflected in their relatively low cost. For certain types of maths algorithms – especially including vectors and matrices – GPUs were faster than CPUs. If the trend continued (it was very obvious that it would) then desktop GPUs would soon have the potential to outperform high-performance CPU clusters for a much lower price.

Early researchers experimented with GPUs by re-writing non-graphics, scientific algorithms as though they were 3D objects that needed to be textured and lit. These early experiments could be considered something of a very clever hack, but the results were promising enough that by 2003 a few languages were being developed to make programming GPUs for non-graphics tasks easier.

2006, again

pvc_aeperformance_part6 — In Part 6 of this series, we saw how 2006 was a hugely significant year for CPUs.

In Part 5 and Part 6, when looking at the history of the CPU, we saw that 2006 was a hugely significant year. With the release of the new Core range of CPUs, superseding the Pentium 4, Intel had launched the era of multi-core CPUs, and desktop multiprocessing.

But 2006 didn’t stop there. It wasn’t only CPUs that changed direction forever. The same year, nVidia launched their latest GPU – the GeForce 8 – and with it a brand new language specifically designed for programming GPUs, called CUDA. The GPU was no longer just a Graphics Processing Unit. With the full development weight of nVidia behind it, CUDA was the advent of a new language that opened the doors of the GPU to all software developers. The GPU was now like another computer – a more specialized type of CPU. Instead of being limited to rendering 3D graphics, GPUs could now be used to calculate other, more generalized tasks.

The GPU was no longer just a graphics processing unit, it was now a General Purpose GPU. While nVidia marketed the GeForce 8 as a “GPGPU”, the term hasn’t been quite as successful as the plain old GPU moniker, although it’s still used.

GPGPUs were still specialized chips that were designed primarily to render 3D graphics. They weren’t designed in the same way as general purpose CPUs, and they were limited to certain types of applications. It wasn’t as though GPUs were suddenly going to replace CPUs. But for the specific range of tasks that they were suited for, the performance benefits could be staggering.

In the same way that the change from the Pentium 4 to the Core Duo is still “in progress” today, the changes made by nVidia with the launch of CUDA and the Geforce 8 are also still being felt today.

CUDA been a contender

The launch of CUDA remains a significant historical milestone, but it also presented a problem. CUDA had been developed by nVidia, and it remained a proprietary nVidia technology.

Despite the historical significance of the Geforce 8 and CUDA, in 2006 nVidia actually had a smaller market share than their rival, Ati. Since nVidia had bought out 3dfx and a few other manufacturers had faded away, nVidia and Ati basically had the 3D gaming market to themselves. In 2006 Ati was bought by AMD, but despite the branding change the rivalry with nVidia has continued through to this day.

The launch of CUDA enabled everyday software developers to harness the processing power of GPUs – but only nVidia GPUs. While rival Ati (now owned by AMD) could have responded with their own proprietary GPU language, it wasn’t in anyone’s best interest to start another platform war. Instead, a coalition of major tech companies – instigated by Apple and including IBM, Intel and AMD – developed a new, open source language for GPUs. To emphasise the fact it was a non-proprietary, open-source platform it was named Open CL – clearly aligning itself with the successful Open GL graphics language. nVidia were also on board, acknowledging the benefits of an open, industry-wide standard that would work on GPUs from any manufacturer. Separately, Microsoft developed their own solution for Windows, which was launched under the “Direct” moniker, called “DirectCompute”.

Open CL was launched in December 2008, with demonstrations from both nVidia and AMD. But Open CL presented its own problems too, mostly to do with the wide range of companies involved, and the relative slowness of dealing with standards committees. nVidia had a few years head start with CUDA, and because it was their own technology they could do what they want with it without bothering with committees and politics. While nVidia supported Open CL – and continue to support it to this day – they continued their own development of CUDA as well. As CUDA is nVidia’s own product, it’s understandable that it receives more attention and tighter integration with nVidia hardware than Open CL. Subsequently, if you have an nVidia GPU and software that performs the same task using either CUDA or Open CL, then you’ll always get the best performance using CUDA. (This doesn’t mean that CUDA performs better than Open CL on an AMD card)

Ever since CUDA was launched in 2006 and Open CL was launched at the end of 2008, software developers have been adding support and incorporating GPU acceleration into their apps. But thanks to the head start which nVidia enjoyed with CUDA, we’re still in a situation today where some apps support CUDA, others support Open CL or DirectCompute, or some combination of all of them.

Open CL had received very strong support from Apple (who initiated its creation) and the early adoption of Open CL and subsequent development push owes a lot to the integration of Open CL with OS X. However in 2018 Apple made it clear that it was deprecating support for both Open GL and Open CL for its own, proprietary replacement called Metal. It’s too early to tell what the long-term impact of this will be, as Apple desktops still have a very low market share, but in the short term it will probably just be a pain in the proverbial for the users.

Since CUDA and Open CL provided a standardized way for all software developers to harness the power of GPUs, they’ve been the main avenue for any programmer looking to improve performance. While their original focus had always been on real-time 3D graphics for games, they could now be utilized for non-real time, photorealistic rendering as well. New rendering engines such as Redshift and Octane were 100% GPU based, while existing products including Vray and Arnold introduced GPU accelerated versions.

While GPUs were originally designed to render 3D games as fast as possible, their evolution into the “GPGPU” opened the door for conventional 3D renders to benefit as well. Over the past decade, improving the performance of software has been all about adding GPU acceleration, and with each new generation of GPU from AMD and nVidia the performance boost become has more and more significant.

In the next few articles we’ll return our focus to After Effects and see how the developments of multi-core CPUs and GPGPUs has influenced Adobe’s software.

This is part 15 in a long-running series on After Effects and Performance. Have you read the others? They’re really good and really long too:

Part 1: In search of perfection

Part 2: What After Effects actually does

Part 3: It’s numbers, all the way down

Part 4: Bottlenecks & Busses

Part 5: Introducing the CPU

Part 6: Begun, the core wars have…

Part 7: Introducing AErender

Part 8: Multiprocessing (kinda, sorta)

Part 9: Cold hard cache

Part 10: The birth of the GPU

Part 11: The rise of the GPGPU