I'm working on a heightmap erosion compute shader in unity, where each point on the map is eroded separately. This is working well for small maps, but the project I'm working on requires 4096x4096 maps. This means 4096^2 = 16777216 points to simulate. With the default thread dimensions of [64,1,1], this creates 262144 thread groups, way more than the allowed limit of 65535.
My question is:
Can I simply raise the thread dimensions, and what do I have to consider in terms of performance when I do?
Is it maybe possible to simply run the shader multiple times, with different ranges of heightmap coordinates?
This is my first time working with shaders. The tutorials I've seen online quickly go too in depth into gpu hardware specifications, so I didn't pick up much from that.