Since early in the production we’ve decided to make use of the Open Image Denoiser (OIDN) that is integrated in Cycles to get noise-free frames with a relatively low sample count.
One big challenge that we faced at the end of production was a relatively high level of noise in certain shots that had already been rendered in final quality. To make things worse: trying to get rid of it by tripling the sample count did little to improve the situation, the noise was still there. So we had to come up with a creative solution that would allow us to make our renders noise-free with relatively little effort and while also re-rendering as little as possible.
To come up with a working idea, we had to exactly understand what was causing the noise we were facing.
The issue was not that individual frames had leftover noise or artifacts. OIDN does a very good job at interpreting the individual noisy render passes to generate a clean image, even with low sample counts. But where it all falls apart is in animation, which is a well-known issue with the current implementation of the denoiser.
With a low sample count there is simply not enough information in the pixels for the denoiser to make an accurate guess on how the theoretical noise-free render would actually look like. So it takes some freedom in guessing to fill in the gaps. That is all fine when you look at a single frame, but as multiple images are rendered, there can be differences in how that missing information is filled in.
For that reason, it was important from the beginning to render with a constant seed. As long as the noise pattern stays the same, the denoiser will create predictable results and interpret the frames the same way, resulting in stable noise-free frames.
But that is unfortunately only true, as long as the camera stays static. The noise-pattern is constant on the screen. So as the camera pans, rotates, zooms or has any movement at all, the noise pattern will move through the image, stuck to the screen and destroy the illusion of a truly clean moving image.
One obvious thing that we tried, was to simply increase the sample count. While that naturally does help, the increased quality was hardly worth the increased render time, as even rendering with 6x the samples (resulting in 6x render time) the video was still not noise free. We did end up doing this in a couple of places selectively by border rendering noisy parts, but we couldn’t afford to do this everywhere.
Another method that we tried, instead of up-scaling the sample count, was to up-scale the resolution of the frames and reduce the samples proportionally. By doubling the pixel amount while cutting the sample count in half we could render at the same cost per frame with an advantage for the denoiser. Rendering with more pixels means getting additional information over things like the surface normals that the denoiser takes into account with much higher detail. This information is basically free, as it needs only a single sample. That way, details that would usually be smaller than a single pixel can be treated much more reliably by the denoising algorithm. The image is then down-scaled back to original scale and hopefully the up-scaling behind the scenes helped to improve things.
This method did produce some solid, measurable improvements, in areas that can capitalize on the additional information (areas with small-scale detail) but it still is not stable over time in the areas that were especially problematic with a lot of transparency or volume passes. This is something we started looking into too late into the rendering process to take advantage of, but it did not help with the most jarring noise issues anyways.
In general, the idea is to use the noise-free nature of a single frame, to stabilize an entire sequence. To pull that off, something that Andy started doing quite early on was to build a proxy mesh, that is loosely following the shape of the noisy background in 3D. Then a single rendered frame could be projected onto it and rendered out for the entire sequence.
This method gives a very nice result, but it is quite cumbersome to set up and gets rid of all the parallax in the noise-free areas that is not accounted for in the proxy mesh. Its viability also highly depends on the layout of the shot and the camera movement.
Finally, let’s talk about the method that we ended up using to save a couple of important, especially noisy shots from jarringly jittering around.
It’s based on the same idea of projecting a still frame onto a proxy mesh. Only the way that proxy mesh is generated, is fully automatic.
That can be done with a technique that is comparable with a simplified version of photogrammetry. Because the image that this projection is based on is a 3D rendering, we basically have all of the information that we need to recreate the position of each pixel in 3D space.
In theory we could just use the actual geometry that was used to render out the shot and project the render back onto it. But dealing with things like volumes, transparency passes and instancing can make it difficult to use only the geometry that is actually needed and it would be difficult to filter out the geometry that was actually seen from that angle that the image is projected from.
But if we just output the depth pass of the render, together with the information about the camera, we already have everything we need to know to make an exact reconstruction of the rendered geometry that is seen from the camera. By just taking into account the camera’s position, rotation, focal length and aspect ratio, we can use the z-depth to calculate the 3D representation of every single pixel of an image. The mesh can be dynamically created using geometry nodes.
That creates a highly detailed 3D representation of the background in the shot with fully baked lighting information. And this can then be rendered out again from the moving shot camera and composited together with the original shot, leaving an almost perfectly noise-free result.
For the composition it’s important to note that any lighting that wasn’t baked into the projected representation, like the moving shadows and occlusions of characters, needs to be masked back in, losing the denoising effect for that area.
You can take a look at an example file showing this method here.
It’s important to point out here that in our case the conditions were specifically beneficial to be able to use this method. There are a couple of things that could simply be improved in the future, but here I want to mention some inherent shortcomings.
For this method specifically, I think there are multiple ways this could go to further improve the usability and quality of results.
I think in general this idea to retain some of the inherent 3D information of a rendered image for the compositing step is something that can be utilized much more in Blender to allow for a more dynamic compositing workflow. Be it for a denoising application like here, depth-based masking or camera-independent composition, etc..
The flexibility that geometry nodes are showcasing, allowing this workflow already, demonstrates as well how the different areas of Blender together can largely propel a workflow with their synergy and compositing is in no way an exception.
@Simon Thommes Thank you for this wonderful post. I am attempting to recreate the Z-depth projection as shown above, however, I am struggling to figure out an approach to rendering occluded geometry to reveal hidden geometry in the scene. Do you have a scene file that uses this proof of concept?
@Jaren Oh, you mean the final video in the post? I don't have a scene using that, but you should be able to recreate it by adding the nodegroup that is shown in the video to the end of every shader nodetree. Then render the image with every threshold value that you want to use. I'm not sure if that results in the correct z-depth.
i've returned to this post a few times, as ray tracing (and the necessary denoising) are fascinating to me. when you tried the supersampling then downscaling approach, what algorithm did you use for downscaling? thank you!
@Alex Fulcrum I've tried bilinear and bicubic, I believe. Right now I don't remember which one I used in the shown example, I should have mentioned it here. But it shouldn't matter too much, as the ratio between the resolutions is always carefully chosen to be an integer.
If I understand correctly... it is a bake on a depth map?
Thanks Simon. What a great post! The level of detailed provided is quite informative. Would using nVidia's denoiser, as opposed to OIDN, provide any benefit? Or do both denoisers lack temporal stability?
@Jacob Picart Thank you, I appreciate that! As far as I know they both don't do temporal denoising yet. It will be interesting to see how well it will perform once there is a compatible implementation, also in comparison to this method!
@Simon Thommes Hi Simon, nice explanation and breakdown of your process, did you try the custom implementation of Pidgeon Laboratory that implement a temporal denoising : https://youtu.be/Dws3XZ0hTgw%C2%A0 ?
I also wanted to mention that video can't play properly on my side using Firefox, I don't know if it's a bug from the website or my browser.
@softyoda Thank you, no I haven't seen that before. Looking at the link you sent, it seems to me like this method does not get rid of the flickering artifacts that were our main concern though. Which video does not play for you? For me every video in this article plays fine, also on Firefox.
@Simon Thommes Hi, the video now work (same browser, maybe it was just a bug) The video at the end doesn't showcase temporal denoising. The add-on basicly does similar thing as what shown here : https://www.youtube.com/watch?v=851cEK0Taro They basicly use the last frame with optical flow to avoid flickering artifacts.
Join to leave a comment.