0

I am trying to implement random access reads and writes to RWStructuredBuffer from multiple thread groups. The race condition can occur when there are two threads (on different thread group) running at the same time (on different multiprocessors) and they are both trying to read/write to the same element in RWStructuredBuffer.

When all the threads are in the same thread group I can use atomics for concurrent writes/reads, so my solution is to dispatch my CS multiple times with just one thread group at once, like so:

for (UINT x = 0; x < mX; ++x)
{
    for (UINT y = 0; y < mY; ++y)
    {
        for (UINT z = 0; z < mZ; ++z)
        {
            //...
            cmdList->Dispatch(1, 1, 1);
        }
    }
}

This way if two threads from different thread group want to access the same part of memory, they must do it sequentially.

My question is, whether or not is this a good solution for my problem since there could be some driver overhead due to multiple calls to the graphics API.

The API is Directx12 and the HLSL is compiled using shader model 5.1.

Thank you for your help. Cheers, Bojan!

TheBojanovski
  • 137
  • 1
  • 7

1 Answers1

1

First, if you use DX12 and you do not insert UAV barriers between your calls, then they will run together without control just like if you called Dispatch(mx,my,mz).

But two different group or even dispatch are fine reading and writing if you only use only atomic operation. Just like you would do on a CPU version.

galop1n
  • 8,573
  • 22
  • 36