I need an algorithm for computing the parallel prefix sum of an array without using shared memory. And if there is no other alternative to using shared memory, what is the best way to tackle the problem of conflicts?
Asked
Active
Viewed 387 times
2
-
This is covered by [GPU Gems 3](http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html) – harold Jan 10 '16 at 13:22
1 Answers
3
This link contains a detailed analysis of the sequential and the parallel algorithms for parallel prefix sum:
Parallel Prefix Sum (Scan) with CUDA
It also contains a fragment of the C code for the implementation of the parallel prefix algorithm and a detailed explanation for avoiding the shared memory conflicts.
You can either port the codes to CUDAfy or simply define regions of C and use them as unmanaged code from your application. But there are several mistakes in the CUDA C code. I am writing the corrected version of the code in Cudafy.NET
[Cudafy]
public static void prescan(GThread thread, int[] g_odata, int[] g_idata, int[] n)
{
int[] temp = thread.AllocateShared<int>("temp", threadsPerBlock);//threadsPerBlock is user defined
int thid = thread.threadIdx.x;
int offset = 1;
if (thid < n[0]/2)
{
temp[2 * thid] = g_idata[2 * thid]; // load input into shared memory
temp[2 * thid + 1] = g_idata[2 * thid + 1];
for (int d = n[0] >> 1; d > 0; d >>= 1) // build sum in place up the tree
{
thread.SyncThreads();
if (thid < d)
{
int ai = offset * (2 * thid + 1) - 1;
int bi = offset * (2 * thid + 2) - 1;
temp[bi] += temp[ai];
}
offset *= 2;
}
if (thid == 0)
{
temp[n[0] - 1] = 0;
} // clear the last element
for (int d = 1; d < n[0]; d *= 2) // traverse down tree & build scan
{
offset >>= 1;
thread.SyncThreads();
if (thid < d)
{
int ai = offset * (2 * thid + 1) - 1;
int bi = offset * (2 * thid + 2) - 1;
int t = temp[ai];
temp[ai] = temp[bi];
temp[bi] += t;
}
}
thread.SyncThreads();
g_odata[2 * thid] = temp[2 * thid]; // write results to device memory
g_odata[2 * thid + 1] = temp[2 * thid + 1];
}
}
You can use the above modified code instead of the one in the link.