I'm interested in the best way to dispatch compute shader tasks for hierarchical culling. This would be with any kind of BVH, such as a Quad-tree, Oct-tree or Kd-tree, which stores a hierarchy of nodes, where child nodes are spatially contained within their parent node.
I have a few ideas for how to do this, but I'm not particularly happy with any of them:
1) Spawn a number of jobs for the total number of nodes and then early-return on nodes whose parents ended up not visible. However, this doesn't guarantee that child nodes won't execute before parent nodes have calculated their visibility, and I'm not sure how to synchronize that properly.
2) Dispatch just the root node, and then if visible, add its children to a UAV. This same shader would then repeat (and spread out to more threads) as long as there are nodes remaining in the UAV. I'm not sure how to actually do this, though, or even if this is possible with DX11 compute.
3) Call dispatch for each tier of nodes, with the maximum number of nodes in that tier (In an Oct-tree, this would be 1, 8, 64, 512, 4096, etc). This seems quite wasteful, but it would allow for visibility communication so that individual nodes can early-out based on their parent's visibility (or if they don't actually exist).
4) Ditch the hierarchical culling entirely and just dispatch a job for each individual object. This seems counter-intuitive to all my years of culling, but it's certainly the most straightforward and parallelizable.
Am I thinking about this correctly at all? Any insight about a good way to do this would be quite helpful!