Unreal’s Nanite – A brief overview

What is Nanite

One of the hard problems in computer graphics specifically for games is how to render unlimited polygons/triangles efficiently in real-time. Artists have dreamt about a world where they don’t have to worry about polycounts. Nanite is UE5 technology that tries to achieve it by implementing very efficient yet dynamic LOD (level of detail) algorithms. This lets artists focus on creating or importing existing film quality meshes without worrying about FPS. Unreal just manages that for you with little to no loss of quality.

Nanite Architecture

Before we can dive into Nanite architecture & how it works, we need to understand a bit about how unreal renders a scene and maybe before that a bit about retained-mode vs immediate mode.

Retained mode roughly means to prepare scene draw’s data in advance, this is where new API’s (Metal, Vulkan) is headed as well. You have to front-load everything and that includes every state of the pipeline. At runtime, you simply dispatch it to GPU as fast as possible. Before 4.22 UE used to be immediate mode, something along the below lines

Since UE 4.22, it moved towards DrawCommands where it follows more of data-driven design (stateless) & these draw commands don’t have any context (e.g. they don’t have state information about where they came from). This helped UE move to retained mode i.e. UE can figure out full pipeline state object, shader bindings &, etc at load time. It also helps UE parallelize draw commands.

Let’s deep dive a bit more into Nanite architecture (it also covers some aspect of UE rendering architecture), click here to open a full resolution image, or you can open the attached file (nanite_uml.txt) in plantuml

Basically, a Nanite pass will produce data for cluster-ID, triangle Id & depth (as shown below)

Ideas to explore in next blog

1.) It would be fun to understand how GPU rasterizer and culling works

2.) Another aspect of Nanite that we didn’t discuss was how the mesh is partitioned by Nanite. This is implemented in Nanite’s graph partitioner (Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.cpp). It’s roughly based upon METIS (http://glaros.dtc.umn.edu/gkhome/metis/metis/overview)


1.) Nanite involves a lot of pre-processing before rendering such as all levels of clipping granularity, data for rasterization, data for a base pass (shown above) & lighting stage as well.

2.) Nanite uses clusters, page-based LOD representation, it forces nanite to build these data structures in advance since they are compute-intensive. This helps to render but causes nanite to only support static meshes.

3.) Nanite is very much a custom GPU driven software rasterizer, it seems some of the ideas came from this presentation, (https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf)

4.) Nanite maintains its data such that it can be discarded by clusters or by page or by the triangle.








Optimizing the Graphics Pipeline with Compute


Leave a Comment

Your email address will not be published. Required fields are marked *