26th Nov 2021 | Sprite Fright
Since early in the production we’ve decided to make use of the Open Image Denoiser (OIDN) that is integrated in Cycles to get noise-free frames with a relatively low sample count.
One big challenge that we faced at the end of production was a relatively high level of noise in certain shots that had already been rendered in final quality. To make things worse: trying to get rid of it by tripling the sample count did little to improve the situation, the noise was still there. So we had to come up with a creative solution that would allow us to make our renders noise-free with relatively little effort and while also re-rendering as little as possible.
To come up with a working idea, we had to exactly understand what was causing the noise we were facing.
The issue was not that individual frames had leftover noise or artifacts. OIDN does a very good job at interpreting the individual noisy render passes to generate a clean image, even with low sample counts. But where it all falls apart is in animation, which is a well-known issue with the current implementation of the denoiser.
With a low sample count there is simply not enough information in the pixels for the denoiser to make an accurate guess on how the theoretical noise-free render would actually look like. So it takes some freedom in guessing to fill in the gaps. That is all fine when you look at a single frame, but as multiple images are rendered, there can be differences in how that missing information is filled in.
For that reason, it was important from the beginning to render with a constant seed. As long as the noise pattern stays the same, the denoiser will create predictable results and interpret the frames the same way, resulting in stable noise-free frames.
But that is unfortunately only true, as long as the camera stays static. The noise-pattern is constant on the screen. So as the camera pans, rotates, zooms or has any movement at all, the noise pattern will move through the image, stuck to the screen and destroy the illusion of a truly clean moving image.
One obvious thing that we tried, was to simply increase the sample count. While that naturally does help, the increased quality was hardly worth the increased render time, as even rendering with 6x the samples (resulting in 6x render time) the video was still not noise free. We did end up doing this in a couple of places selectively by border rendering noisy parts, but we couldn’t afford to do this everywhere.
Another method that we tried, instead of up-scaling the sample count, was to up-scale the resolution of the frames and reduce the samples proportionally. By doubling the pixel amount while cutting the sample count in half we could render at the same cost per frame with an advantage for the denoiser. Rendering with more pixels means getting additional information over things like the surface normals that the denoiser takes into account with much higher detail. This information is basically free, as it needs only a single sample. That way, details that would usually be smaller than a single pixel can be treated much more reliably by the denoising algorithm. The image is then down-scaled back to original scale and hopefully the up-scaling behind the scenes helped to improve things.
This method did produce some solid, measurable improvements, in areas that can capitalize on the additional information (areas with small-scale detail) but it still is not stable over time in the areas that were especially problematic with a lot of transparency or volume passes. This is something we started looking into too late into the rendering process to take advantage of, but it did not help with the most jarring noise issues anyways.
In general, the idea is to use the noise-free nature of a single frame, to stabilize an entire sequence. To pull that off, something that Andy started doing quite early on was to build a proxy mesh, that is loosely following the shape of the noisy background in 3D. Then a single rendered frame could be projected onto it and rendered out for the entire sequence.
This method gives a very nice result, but it is quite cumbersome to set up and gets rid of all the parallax in the noise-free areas that is not accounted for in the proxy mesh. Its viability also highly depends on the layout of the shot and the camera movement.
Finally, let’s talk about the method that we ended up using to save a couple of important, especially noisy shots from jarringly jittering around.
It’s based on the same idea of projecting a still frame onto a proxy mesh. Only the way that proxy mesh is generated, is fully automatic.
That can be done with a technique that is comparable with a simplified version of photogrammetry. Because the image that this projection is based on is a 3D rendering, we basically have all of the information that we need to recreate the position of each pixel in 3D space.
In theory we could just use the actual geometry that was used to render out the shot and project the render back onto it. But dealing with things like volumes, transparency passes and instancing can make it difficult to use only the geometry that is actually needed and it would be difficult to filter out the geometry that was actually seen from that angle that the image is projected from.
But if we just output the depth pass of the render, together with the information about the camera, we already have everything we need to know to make an exact reconstruction of the rendered geometry that is seen from the camera. By just taking into account the camera’s position, rotation, focal length and aspect ratio, we can use the z-depth to calculate the 3D representation of every single pixel of an image. The mesh can be dynamically created using geometry nodes.
That creates a highly detailed 3D representation of the background in the shot with fully baked lighting information. And this can then be rendered out again from the moving shot camera and composited together with the original shot, leaving an almost perfectly noise-free result.
For the composition it’s important to note that any lighting that wasn’t baked into the projected representation, like the moving shadows and occlusions of characters, needs to be masked back in, losing the denoising effect for that area.
You can take a look at an example file showing this method here.
It’s important to point out here that in our case the conditions were specifically beneficial to be able to use this method. There are a couple of things that could simply be improved in the future, but here I want to mention some inherent shortcomings.
For this method specifically, I think there are multiple ways this could go to further improve the usability and quality of results.
I think in general this idea to retain some of the inherent 3D information of a rendered image for the compositing step is something that can be utilized much more in Blender to allow for a more dynamic compositing workflow. Be it for a denoising application like here, depth-based masking or camera-independent composition, etc..
The flexibility that geometry nodes are showcasing, allowing this workflow already, demonstrates as well how the different areas of Blender together can largely propel a workflow with their synergy and compositing is in no way an exception.