While CPU (Central Processing Unit) vs. GPU (Graphics Processing Unit) debate is familiar and has been subject to many professional articles as well as many consumer forums to understand and discuss possible computer architecture’s system to achieve the best possible performance in terms of graphics rendering and particularly graphics involving 3D or video, it appears that major professional actors, graphics processors specialists have been working towards new solutions that open the way to multiple applications, especially through the development of massive parallel computing solutions that promise high quality results while speeding up the rendering time.

It may not have been predictable but the launch of the Nvidia GeForce 256 in 1999 marketed as the “world first GPU” by the firm augured the tipping point to the GPU era. This was very closely followed by the introduction of the Radeons ATI/AMD a couple of years later. An ineluctable race for speed and better performance occurred aiming to exploit the high potential power of GPU computing architecture and its possible parallel system that allows infinite development from single chip to multiples chips (parallel architecture) to multiple micro-GPUs on the same chips.


By way of reference to understand the exponential power development of GPUs let’s note that in 2009, the computing power of a GPU was 1 TFlops (against 100 GFlops for a consumer market CPU) while 7 years later (2016), the latest GPU generation offers 11TFlops.

This search for higher performance (supported by the increasing consumer market demand for video games especially) found leverage through various technical development and improvement: from better algorithms to increasing capabilities at the manufacturing level which allow to install on a single chip multi-GPUs processors, increasing the capacity and the GPU graphics cards’ resources and leading to the evolution of the GPU to highly parallel computing dedicated circuits. Modern GPUs comprise thousand of cores able to handle large amounts of data simultaneously in many streams, performing relatively simple operations on them, initially particularly optimized for computer graphics processing which require massive data parallel operations. 3D computer graphics processing and particularly 3D rendering rely on many of the same algorithms repeated to each pixel with no interdependency which is commonly referred to an embarrassingly parallel workload and particularly well-handled by GPUs that have been heavily optimized for this kind of processing.

Further speedup can be gained by using multiple GPUs in the same system which further parallelizes the already parallel nature of the computation performed by a GPU. This may be required according to the workload to be handled (real time rendering requirement) and the demand for higher degree of realistic rendering. Photorealistic quality being a common standard at present it is a fact that rendering processes are computationally expensive, given the complex variety of physical features being simulated. And real-time rendering another prevalent requirement requires billions of pixels per second with each pixel requiring hundreds or more operations.

Several comparative studies paralleling CPU and GPU powers have particularly shown impact in terms of cost and energy consumption (and consequently heat generation) if it was necessary to equip 10 to 15 CPUs for the computing power of a single GPU. At the same time the development of GPU production caused the retail rates of GPU graphics cards to fall drastically, from thousands of euros specialized workstations to only few hundred euros today available to the average user.

Also from a simple piece of hardware that was originally designed to accelerate the displaying needs of a working computer, GPU has become much more polyvalent and able to handle various tasks including 3D functionalities. Shaders coded for GPU were introduced speeding up graphics considerably; more recently Multi level Caches were developed on GPUs avoiding use of much slower main-system memory (the latest NVidia Pascal GPU (2016) has 4096 KB last-level cache while for the brand in 2008 GPUs did not feature an L2 cache) contributing to the processes acceleration as well.

As such even a single GPU-CPU framework provides advantages that multiple CPUs on their own do not offer due to the specialization in each chip. It also became obvious for the constructors that such power was a field of interest for research and development and efforts were put on the programmability of the GPU moving the GPU purpose towards mainstream computing or general computing.

This move has been supported by creating platforms allowing specifically written programs to execute across heterogeneous hardwares and particularly CPUs and GPUs. Several solutions emerged like NVidia CUDA (Compute Unified Device Architecture) or Khronos OpenCL leading to the GPGPU technologies (General-purpose computing on graphics processing units). CUDA was introduced in 2007 and OpenCL in 2008, this pretty new technology revolutionized the traditional roles of CPUs and GPUs by enabling bidirectional processing. In traditional computing, data could simply pass one-way from a CPU to a GPU, then to a display device and a GPU couldn’t store and pass information back. GPGPU allows information to be transferred in both directions, from CPU to GPU and GPU to CPU which hugely improve efficiency in a large variety of tasks and particularly tasks that are traditionally handled solely by the CPU.

We now understand the value of such bidirectional technology for 3D rendering. Newly developed applications like UNICORN now efficiently optimize resources of both CPU and GPU distributing rendering loads across processors and enabling scalability which delivers maximum speed. This makes such application powerful tools for real time rendering. With such technology and hardware architecture it seems that we are progressing towards applications’ flexibility and synergy of the processors computation power away from the debate CPU vs GPU in 3D graphics.