I am trying to implement a Spinlock in GLSL. It will be used in the context of Voxel Cone Tracing. I try to move the information, which stores the lock state, to a separate 3D texture which allows atomic operations. In order to not waste memory I don't use a full integer to store the lock state but only a single bit. The problem is that without limiting the maximum number of iterations, the loop never terminates. I implemented the exact same mechanism in C#, created a lot of tasks working on shared resources and there it works perfectly. The book Euro Par 2017: Parallel Processing Page 274 (can be found on Google) mentions possible caveats when using locks on SIMT devices. I think the code should bypass those caveats.
Problematic GLSL Code:
void imageAtomicRGBA8Avg(layout(RGBA8) volatile image3D image, layout(r32ui) volatile uimage3D lockImage,
ivec3 coords, vec4 value)
{
ivec3 lockCoords = coords;
uint bit = 1<<(lockCoords.z & (4)); //1<<(coord.z % 32)
lockCoords.z = lockCoords.z >> 5; //Division by 32
uint oldValue = 0;
//int counter=0;
bool goOn = true;
while (goOn /*&& counter < 10000*/)
//while(true)
{
uint newValue = oldValue | bit;
uint result = imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
//Writing is allowed if could write our value and if the bit indicating the lock is not already set
if (result == oldValue && (result & bit) == 0)
{
vec4 rval = imageLoad(image, coords);
rval.rgb = (rval.rgb * rval.a); // Denormalize
vec4 curValF = rval + value; // Add
curValF.rgb /= curValF.a; // Renormalize
imageStore(image, coords, curValF);
//Release the lock and set the flag such that the loops terminate
bit = ~bit;
oldValue = 0;
while (goOn)
{
newValue = oldValue & bit;
result = imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue)
goOn = false; //break;
oldValue = result;
}
//break;
}
oldValue = result;
//++counter;
}
}
Working C# code with identical functionality
public static void Test()
{
int buffer = 0;
int[] resource = new int[2];
Action testA = delegate ()
{
for (int i = 0; i < 100000; ++i)
imageAtomicRGBA8Avg(ref buffer, 1, resource);
};
Action testB = delegate ()
{
for (int i = 0; i < 100000; ++i)
imageAtomicRGBA8Avg(ref buffer, 2, resource);
};
Task[] tA = new Task[100];
Task[] tB = new Task[100];
for (int i = 0; i < tA.Length; ++i)
{
tA[i] = new Task(testA);
tA[i].Start();
tB[i] = new Task(testB);
tB[i].Start();
}
for (int i = 0; i < tA.Length; ++i)
tA[i].Wait();
for (int i = 0; i < tB.Length; ++i)
tB[i].Wait();
}
public static void imageAtomicRGBA8Avg(ref int lockImage, int bit, int[] resource)
{
int oldValue = 0;
int counter = 0;
bool goOn = true;
while (goOn /*&& counter < 10000*/)
{
int newValue = oldValue | bit;
int result = Interlocked.CompareExchange(ref lockImage, newValue, oldValue); //imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue && (result & bit) == 0)
{
//Now we hold the lock and can write safely
resource[bit - 1]++;
bit = ~bit;
oldValue = 0;
while (goOn)
{
newValue = oldValue & bit;
result = Interlocked.CompareExchange(ref lockImage, newValue, oldValue); //imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue)
goOn = false; //break;
oldValue = result;
}
//break;
}
oldValue = result;
++counter;
}
}
The locking mechanism should work quite identical as the one described in OpenGL Insigts Chapter 22 Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer by Cyril Crassin and Simon Green. They just use integer textures to store the colors for every voxel which I would like to avoid because this complicates Mip Mapping and other things. I hope the post is understandable, I get the feeling it is already becoming too long...
Why does the GLSL implementation not terminate?