I am trying to implement random access reads and writes to RWStructuredBuffer from multiple thread groups. The race condition can occur when there are two threads (on different thread group) running at the same time (on different multiprocessors) and they are both trying to read/write to the same element in RWStructuredBuffer.
When all the threads are in the same thread group I can use atomics for concurrent writes/reads, so my solution is to dispatch my CS multiple times with just one thread group at once, like so:
for (UINT x = 0; x < mX; ++x)
{
for (UINT y = 0; y < mY; ++y)
{
for (UINT z = 0; z < mZ; ++z)
{
//...
cmdList->Dispatch(1, 1, 1);
}
}
}
This way if two threads from different thread group want to access the same part of memory, they must do it sequentially.
My question is, whether or not is this a good solution for my problem since there could be some driver overhead due to multiple calls to the graphics API.
The API is Directx12 and the HLSL is compiled using shader model 5.1.
Thank you for your help. Cheers, Bojan!