Sprite Fright

Denoising Sprite Fright

At the end of the production we were facing a high level of noise in some of the key shots. This is how we handled the denoising.

26 Nov 2021
10 min read
2 min watch time

Since early in the production we’ve decided to make use of the Open Image Denoiser (OIDN) that is integrated in Cycles to get noise-free frames with a relatively low sample count.

One big challenge that we faced at the end of production was a relatively high level of noise in certain shots that had already been rendered in final quality. To make things worse: trying to get rid of it by tripling the sample count did little to improve the situation, the noise was still there. So we had to come up with a creative solution that would allow us to make our renders noise-free with relatively little effort and while also re-rendering as little as possible.

Some of the especially noisy shots that needed additional attention to denoising in comparison.

The Problem

To come up with a working idea, we had to exactly understand what was causing the noise we were facing.
The issue was not that individual frames had leftover noise or artifacts. OIDN does a very good job at interpreting the individual noisy render passes to generate a clean image, even with low sample counts. But where it all falls apart is in animation, which is a well-known issue with the current implementation of the denoiser.
With a low sample count there is simply not enough information in the pixels for the denoiser to make an accurate guess on how the theoretical noise-free render would actually look like. So it takes some freedom in guessing to fill in the gaps. That is all fine when you look at a single frame, but as multiple images are rendered, there can be differences in how that missing information is filled in.

On surfaces with little detail the noise pattern moving through the image gets visible, as can be seen on the rock here.

For that reason, it was important from the beginning to render with a constant seed. As long as the noise pattern stays the same, the denoiser will create predictable results and interpret the frames the same way, resulting in stable noise-free frames.
But that is unfortunately only true, as long as the camera stays static. The noise-pattern is constant on the screen. So as the camera pans, rotates, zooms or has any movement at all, the noise pattern will move through the image, stuck to the screen and destroy the illusion of a truly clean moving image.

More Samples

One obvious thing that we tried, was to simply increase the sample count. While that naturally does help, the increased quality was hardly worth the increased render time, as even rendering with 6x the samples (resulting in 6x render time) the video was still not noise free. We did end up doing this in a couple of places selectively by border rendering noisy parts, but we couldn’t afford to do this everywhere.

Renders with 800 and 5000 samples in comparison.

Super Resolution Rendering

Another method that we tried, instead of up-scaling the sample count, was to up-scale the resolution of the frames and reduce the samples proportionally. By doubling the pixel amount while cutting the sample count in half we could render at the same cost per frame with an advantage for the denoiser. Rendering with more pixels means getting additional information over things like the surface normals that the denoiser takes into account with much higher detail. This information is basically free, as it needs only a single sample. That way, details that would usually be smaller than a single pixel can be treated much more reliably by the denoising algorithm. The image is then down-scaled back to original scale and hopefully the up-scaling behind the scenes helped to improve things.

In a still frame there is a visible improvement of the AI denoiser retaining fine detail when denoising at an increased resolution. (Especially bark and mushrooms show this here.)

This method did produce some solid, measurable improvements, in areas that can capitalize on the additional information (areas with small-scale detail) but it still is not stable over time in the areas that were especially problematic with a lot of transparency or volume passes. This is something we started looking into too late into the rendering process to take advantage of, but it did not help with the most jarring noise issues anyways.

The issue of general jittering noise in motion is not solved using this method. (Especially noticeable in the moss.)

So how do we get rid of the noise now?

In general, the idea is to use the noise-free nature of a single frame, to stabilize an entire sequence. To pull that off, something that Andy started doing quite early on was to build a proxy mesh, that is loosely following the shape of the noisy background in 3D. Then a single rendered frame could be projected onto it and rendered out for the entire sequence.

Projection of a single rendered frame onto manually created meshes.

This method gives a very nice result, but it is quite cumbersome to set up and gets rid of all the parallax in the noise-free areas that is not accounted for in the proxy mesh. Its viability also highly depends on the layout of the shot and the camera movement.

Z-Depth Projection

Finally, let’s talk about the method that we ended up using to save a couple of important, especially noisy shots from jarringly jittering around. It’s based on the same idea of projecting a still frame onto a proxy mesh. Only the way that proxy mesh is generated, is fully automatic.
That can be done with a technique that is comparable with a simplified version of photogrammetry. Because the image that this projection is based on is a 3D rendering, we basically have all of the information that we need to recreate the position of each pixel in 3D space.

Breakdown of the z-depth projection method using geometry nodes to generate the projection mesh.

In theory we could just use the actual geometry that was used to render out the shot and project the render back onto it. But dealing with things like volumes, transparency passes and instancing can make it difficult to use only the geometry that is actually needed and it would be difficult to filter out the geometry that was actually seen from that angle that the image is projected from.

But if we just output the depth pass of the render, together with the information about the camera, we already have everything we need to know to make an exact reconstruction of the rendered geometry that is seen from the camera. By just taking into account the camera’s position, rotation, focal length and aspect ratio, we can use the z-depth to calculate the 3D representation of every single pixel of an image. The mesh can be dynamically created using geometry nodes.

That creates a highly detailed 3D representation of the background in the shot with fully baked lighting information. And this can then be rendered out again from the moving shot camera and composited together with the original shot, leaving an almost perfectly noise-free result.
For the composition it’s important to note that any lighting that wasn’t baked into the projected representation, like the moving shadows and occlusions of characters, needs to be masked back in, losing the denoising effect for that area.

You can take a look at an example file showing this method here.

Screencapture of the setup with geometry nodes using the described method as it was used for one of the shots in Sprite Fright.

Shortcomings

It’s important to point out here that in our case the conditions were specifically beneficial to be able to use this method. There are a couple of things that could simply be improved in the future, but here I want to mention some inherent shortcomings.

It is only possible to utilize this method with fully static backgrounds. A budget decision early on in the production was to have no moving foliage in any of the backgrounds (except for opening and closing shots). Without this decision the issue of jittering noise would have been in basically every day-time shot and we would also not have been able to denoise them with the projection method.
Another requirement is on the materials and lighting conditions of the reconstructed scene. If there are a lot of reflections and other material properties that rely on the viewing angle, the illusion can quickly fall apart, as the reconstruction will create a fully static result. The same is true for moving shadows and light sources.
While the 3D reconstruction allows for some freedom with the camera movement of the shot, the fact that it was done from a single camera point reveals quite some open areas that are missing information and will stay noisy in the final composite. So a smart viewing point needs to be chosen and it works ideally when the camera is not moving too much. Of course, this can also be solved with a setup using multiple projection cameras.

Further Ideas

For this method specifically, I think there are multiple ways this could go to further improve the usability and quality of results.

One thing that I was planning initially was to use all the frames of the already rendered sequences, mapping them all in 3D and averaging the colors of the result to get an even cleaner image.
More advanced than that one could even take all frames and directly solve a 3D model from the input to use for recreation of the rendered data. Meshing point clouds is also used for photogrammetry and has many further applications than just denoising like here.
To solve the issue of manually compositing characters, or specifically their shadows into the cleaned plates some masking could be done procedurally by looking at deviations from the clean base on clusters of pixels. That’s something that would need a lot of fine-tuning and was not quite worth investigating in our case due to time constraints and scope.
Another idea that would expand the use of baking render data into generated geometry would be to come up with a way that allows to render the occluded geometry as well. This could be done by either slicing the geometry with camera clipping, or a smarter approach that skips a certain number of initial bounces from the camera rays in the renderer. That way a number of layers of geometry could be reconstructed and baked from the same camera angle using this method. The reconstructions of the different ray depths could then be composed to a single one that can be navigated much more freely, as surfaces that would be hidden from the projection camera are revealed. That could be very useful to create interactive rendered snapshots of 3D scenes.

Proof of concept for an x-ray rendering setup that allows to skip light bounces to reveal hidden geometry.

I think in general this idea to retain some of the inherent 3D information of a rendered image for the compositing step is something that can be utilized much more in Blender to allow for a more dynamic compositing workflow. Be it for a denoising application like here, depth-based masking or camera-independent composition, etc..

The flexibility that geometry nodes are showcasing, allowing this workflow already, demonstrates as well how the different areas of Blender together can largely propel a workflow with their synergy and compositing is in no way an exception.

Join to leave a comment.

10 comments

Jaren

Jan. 7th, 2023

@Simon Thommes Thank you for this wonderful post. I am attempting to recreate the Z-depth projection as shown above, however, I am struggling to figure out an approach to rendering occluded geometry to reveal hidden geometry in the scene. Do you have a scene file that uses this proof of concept?

Simon Thommes

Jan. 9th, 2023

@Jaren Oh, you mean the final video in the post? I don't have a scene using that, but you should be able to recreate it by adding the nodegroup that is shown in the video to the end of every shader nodetree. Then render the image with every threshold value that you want to use. I'm not sure if that results in the correct z-depth.

Alex Fulcrum

Dec. 20th, 2022

i've returned to this post a few times, as ray tracing (and the necessary denoising) are fascinating to me. when you tried the supersampling then downscaling approach, what algorithm did you use for downscaling? thank you!

Simon Thommes

Jan. 2nd, 2023

@Alex Fulcrum I've tried bilinear and bicubic, I believe. Right now I don't remember which one I used in the shown example, I should have mentioned it here. But it shouldn't matter too much, as the ratio between the resolutions is always carefully chosen to be an integer.

Riccardo Gagliarducci

Dec. 1st, 2021

If I understand correctly... it is a bake on a depth map?

Jacob Picart

Dec. 1st, 2021

Thanks Simon. What a great post! The level of detailed provided is quite informative. Would using nVidia's denoiser, as opposed to OIDN, provide any benefit? Or do both denoisers lack temporal stability?

Simon Thommes

Dec. 1st, 2021

@Jacob Picart Thank you, I appreciate that! As far as I know they both don't do temporal denoising yet. It will be interesting to see how well it will perform once there is a compatible implementation, also in comparison to this method!

softyoda

Dec. 3rd, 2021

@Simon Thommes Hi Simon, nice explanation and breakdown of your process, did you try the custom implementation of Pidgeon Laboratory that implement a temporal denoising : https://youtu.be/Dws3XZ0hTgw%C2%A0 ?

I also wanted to mention that video can't play properly on my side using Firefox, I don't know if it's a bug from the website or my browser.

Simon Thommes

Dec. 6th, 2021

@softyoda Thank you, no I haven't seen that before. Looking at the link you sent, it seems to me like this method does not get rid of the flickering artifacts that were our main concern though. Which video does not play for you? For me every video in this article plays fine, also on Firefox.

softyoda

Jan. 31st, 2022

@Simon Thommes Hi, the video now work (same browser, maybe it was just a bug) The video at the end doesn't showcase temporal denoising. The add-on basicly does similar thing as what shown here : https://www.youtube.com/watch?v=851cEK0Taro They basicly use the last frame with optical flow to avoid flickering artifacts.

Stylized Rendering with Brushstrokes

Geometry Nodes from Scratch

Procedural Shading Fundamentals

Stylized Character Workflow

Singularity

Wing It!

Charge

Sprite Fright

Project DogWalk

Gold

BCON24 Identity

Fighting with Grease Pencil

Denoising Sprite Fright

The Problem

More Samples

Super Resolution Rendering

So how do we get rid of the noise now?

Z-Depth Projection

Shortcomings

Further Ideas

10 comments

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Stylized Rendering with Brushstrokes

Geometry Nodes from Scratch

Procedural Shading Fundamentals

Stylized Character Workflow

Singularity

Wing It!

Charge

Sprite Fright

Project DogWalk

Gold

BCON24 Identity

Fighting with Grease Pencil

Denoising Sprite Fright

The Problem

More Samples

Super Resolution Rendering

So how do we get rid of the noise now?

Z-Depth Projection

Shortcomings

Further Ideas

10 comments