Bluenoise Generator For Bigger Textures
Background
A while back I wrote a Bluenoise Generator using WEBGL2, which works reasonably well and was only ever intended to be used for small-ish textures, say 64x64.
That said, trying to generate bigger textures (say 1024x1024) and you'd be waiting for minutes.
Practically this is fine, you don't typically attempt to generate bluenoise at runtime, normally offsetting the texture read by some multiple of the golden ratio each frame. Even if you did, something like 64x64 is probably a reasonable enough size, assuming it tiles.
Theoretically, this isn't very satisfying.
Why Is It Slow?
The reason the previous generator is slow, is simply because there are as many iterations as there are pixels on the final image.
With each iteration having to:
- Process the entire image, to find the smallest value.
- Update the energy value of the surrounding pixels.
Even though what happens in an iteration makes use of GPU parallelization, the iterations themselves do not.
Meaning the amount of actual dispatched commands becomes:
numDispatchesPerPass = (
1 // find void
+ numReductionPasses // ceil((log2(max(width, height)) / log2(8))
+ 1 // update energy + pixel value
)
numDispatchesTotal = width * height * numDispatchesPerPass
As a result, if you want to fill a 1024x1024 image, you would need to dispatch 6291456 (1024 * 1024 * 6) commands.
No wonder it takes so long.
How Can We Speed It Up?
If you've read the previous write up I did, you'll have noticed that quite a lot of it revolves around "how much updating can I get away with not doing".
In case you haven't, if we have a σ around the 1.9 mark, then realistically we should only have to worry about the surrounding 8x8 to get a reasonably artefact free result.
Additionally, it's pretty likely the final texture format you'd end up using is either going to be BC4, R8 or R16/R16f (if you're feeling fancy), making full precision is somewhat redundant.
You probably see where this is heading, we're going to dice up the texture into tiles and update each tile per iteration.
Tiling
We can't simply evaluate every tile in parallel, we need to process them in such a way that they can take their neighbouring tiles into account.
Easily done with a 4x4 pattern, (also the reason why the number of tiles have be multiple of 2, could be made to work with odd numbers, but honestly I couldn't be bothered).
[0, 0]
[1, 1]
[1, 0]
[0, 1]
As such the number of iterations is the number of pixels within a tile, two shaders execute per iteration (pick and update), meaning the total number of dispatchs is 2 * 4 * tileSize * tileSize
, which is constant regardless of final resolution.
Tile Size | Dispatch Count | Unique Values |
---|---|---|
8 | 512 | 64 |
16 | 2048 | 256 |
32 | 8192 | 1024 |
A quick look at the generated bluenoise and corresponding DFT, things generally look fine (σ=1.9).
However it is worth mentioning, there are patterns that present themselves when you look a little deeper. This is the result of averaging 128 512x512 blue noise textures (again with σ=1.9), at different tile sizes:
tile size: 8
tile size: 16
tile size: 32
I'm not sure if this actually matters, but seemed worth putting out there as an observation.
Here are all the relevant source code files:
- Update Energy (GLSL): update_energy.comp
- Pick Tile Cell (GLSL): pick.comp
- Helpers (GLSL): bn2common.glsli
- JS used for demo: void_and_cluster2_webgpu.js
Also (again) for funsies here is a shadertoy port: https://www.shadertoy.com/view/XcyXDh