Why you should use native API for developing rendering engines.

During a cool drinks party, a fellow developer asked me why we shouldn’t use WebGL 2.0 since it works on browsers & one can create desktop apps using frameworks such as Electron that uses the chromium framework. My friend’s suggestion was this will allow you to write code in JS (or typescript) once & it can work on the web as well as your desktop app, basically, code once & don’t worry at all about feature parity across different platforms.

It does sound too good to be true, and it is! Using WebGL 2.0 limits the possibility of what can be achieved using today’s GPU as well as the capabilities that modern graphics/compute API provide. For the starter, WebGL 2.0 is based upon OpenGL 3.0.6 ES specification which was released in 2012. The API itself is about 10 year old at this time and even if we compare it to modern OpenGL 4.6, it’s limited in features it support.

The idea of this article is to provide a comparison between WebGL 2.0 vs modern OpenGL (4.6) showing features that aren’t supported by WebGL 2.0 and how they can be used.

Features not supported by WebGL 2.0 (aka OpenGL ES 3.0)

  • Tessellation shaders
  • Compute Shaders
  • Indirect draw
  • MultiDraw (extenstion exist but has limited support).
  • Conditional rendering
  • Atomics (such as atomicAdd, exchange, compswap etc)
  • Bindless rendering
  • Shader Storage Buffer Object
  • Sync objects/Memory barriers

Next, we will look a bit into details about these paradigms & how these features are useful for building performant graphics pipelines/applications.

Tessellation shaders

Tessellation shader is an additional step in the graphics pipeline that is executed after the vertex shader step. As the name described, it basically helps with subdivision based upon a variety of parameters. These are executed on GPU like vertex/fragment shader & can help with the performance of various ideas, some examples are listed below

  • Adaptive subdivision of surfaces (specify the size & curvature).
  • Creater coarser models on CPU side but display finer one, basically helping with geometry compression.
  • Calculate display quality based on LOD (at runtime) based upon parameters such as distance from eye, screen space spanning etc.
  • Simulating ocean waves by using displacement map along with some coarse ocean vertices.
  • Tessellation shaders are frequently used with displacement/bump maps.
  • Pixar’s Opensubdiv uses tessellation shaders (or at least conceptually), you won’t be able to use it without being native.
  • Culling triangles out of frustrum or back face culling can be done during this stage as well
  • Phong tessellation etc for smoothed silhouettes.

References

https://ogldev.org/www/tutorial30/tutorial30.html

https://www.youtube.com/watch?v=63ufydgBcIk

https://erkaman.github.io/posts/tess_opt.html

Compute shaders

Compute shaders as the name suggests is another stage that can be used entirely to run compute tasks on GPU, think of this as CUDA kernels. Any task that is data-heavy can take advantage of compute shaders for computing arbitrary information. Below are a few ideas where the use of compute shaders is necessary

  • LinkedList based algorithm to implement OIT
  • Boid simulations
  • Noise texture generations
  • GPU based Particle effects
  • Depth fog effects

Indirect draw

The idea behind this is to avoid round trips between CPU & GPU. Instead of issuing draw commands using the CPU, we will issue a draw command that contains a pointer to GPU memory/buffer. The idea is that the data of this allocated GPU memory could be filled using Compute shader or CUDA/OpenCL processes. The CPU simply triggers a draw call specifying the memory allocated on GPU, the GPU reads from that memory & decides how many elements to draw, which shader to use, etc. This means you can issue one draw call to even render a complete scene on GPU.

Below are some of the ideas where the indirect draw is used.

  • Culling using compute shader & then using indirect draw for drawing.
  • GPU rasterized particles.

References

https://cpp-rendering.io/indirect-rendering/

https://docs.microsoft.com/en-us/windows/win32/direct3d12/indirect-drawing-and-gpu-culling-

https://www.saschawillems.de/blog/2016/08/06/new-vulkan-example-indirect-drawing/

MultiDraw (WebGL 2.0 supports an extension for this)

Multidraw lets developers draw different geometries multiple times giving much more power to leverage GPU capabilities. This basically can move a CPU loop to a GPU loop hence reducing CPU load along with possible draw calls required for every frame.

Conditional rendering

Conditional rendering is typically used to support occlusion culling. The basic principle is that a user can draw a bounding box for a complex object and only proceed to draw the actual object if the bounds of that object pass some fragment/depth/stencil/scissor tests. If the bounds don’t produce any samples, then the actual object must be occluded and can be culled without having to go through the entire rasterization pipeline.

The important piece is that the CPU shouldn’t wait for the result of an occlusion query to know whether to issue a draw call or not, this is where conditional rendering (also called predication in Directx) allows GPU to turn a draw call into no-op in case occlusion query result found no samples. The occlusion query simply populates the data required to decide whether to draw or not.

References

https://github.com/castle-engine/castle-engine/wiki/Occlusion-Query

https://www.mbsoftworks.sk/tutorials/opengl3/27-occlusion-query/

https://paroj.github.io/gltut/Optimize%20Core.html
https://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch29.html

Atomics

Compute shaders provide the ability for multiple threads to access the same memory locations, without the support of atomic it wouldn’t be possible to access the memory correctly. Atomics are used in many cases such as

Bindless rendering

In a traditional binding-based pipeline, before making draw calls, a user needs to set all the resources that the shader will need such as the camera’s properties, material properties, textures, etc (https://developer.nvidia.com/vulkan-shader-resource-binding). Binding is expensive since GPU needs to make a connection to data that will be accessed by the shader.

Bindless doesn’t truly mean you don’t have to bind anything at all, instead, you create a big array that contains all resources (textures, buffers, etc) at the start of the frame, shaders then simply index into this big array to access it. This helps with performance since you don’t have to transfer many descriptors to GPU every time.

Below are some ideas that bindless helps with

References

https://vkguide.dev/docs/gpudriven/gpu_driven_engines/#bindless-design

Shader Storage Buffer Object

These are shader writable structures that are similar to uniform buffers, they are helpful for implementing techniques such as OIT, the references below do a wonderful job at explaining these.

References

https://www.khronos.org/opengl/wiki/Shader_Storage_Buffer_Object

https://www.geeks3d.com/20140704/tutorial-introduction-to-opengl-4-3-shader-storage-buffers-objects-ssbo-demo/

Leave a Comment

Your email address will not be published. Required fields are marked *