Im working on a particle sim and have ran into a bit of a bottleneck, using UAV to write to a RWStructured single float buffer is around 10 times too slow. From experimentation it seems there is no shortage in bandwidth but just the access time itself boggles it down. Append writing is out of the question since the outgoing data needs to be in a specific order. This is on DX10/SM4 hardware so here are a few questions: Is there any way at all to speed things up (other than writing larger chunks of data since the output from the shaders is non consecutive)? If not then is DX11 grade hardware any quicker with UAVs?
Asked
Active
Viewed 717 times
1 Answers
0
First thing (if you haven't done already), to profile your shader code, is to add GPU queries to your system. Here is a link to explain it:
http://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/
It's in dx11 but features are in dx10 too, so it should be really simple to port over.
After in compute there's different aspects, but first one would be to play with:
[numthreads(TGX, 1, 1)]
Trying values like 8,16,32,64 and try to find the sweet spot (don't forget to divide on your dispatch).

mrvux
- 8,523
- 1
- 27
- 61