6

Can someone please point me in the right direction on how to do this type of calculation in parallel, or tell me what the general name of this method is? I don't think these will return the same result.

C++

for (int i = 1; i < width; i++)
        x[i] = x[i] + x[i-1];

CUDA

int i = blockIdx.x * blockDim.x + threadIdx.x

if ((i > 0) && (i < (width)))
    X[i] = X[i] + X[i-1];
Vitality
  • 20,705
  • 4
  • 108
  • 146
Gswffye
  • 180
  • 3
  • 13

2 Answers2

10

This looks like a cumulative sum operation, in which the final value of x[i] is the sum of all values x[0]...x[i] in the original array.

In CUDA, this is called a scan or prefix-sum operation, and it can be efficiently parallelized. See e.g. this lecture for examples.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • 5
    parallel prefix sum examples are also available in the [cuda samples](http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-parallel-prefix-sum--scan-) and in [thrust](http://thrust.github.io/doc/group__prefixsums.html) as well as in [cub](http://nvlabs.github.io/cub/structcub_1_1_device_scan.html). – Robert Crovella Aug 11 '14 at 20:54
  • The cited lecture is discussing very preliminary work that is a poor fit for CUDA, and that has been significantly improved upon. IMHO no discussion of scan is complete without a discussion of how it relates to parallel prefix sum circuitry (e.g., p392-393 of The CUDA Handbook). – ArchaeaSoftware Aug 13 '14 at 15:18
  • This chapter of GPU Gems cover the topic quite exhaustively https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html – Maxim May 20 '18 at 09:37
2

It has been already recognized that your problem amounts at a cumulative summation. Following Robert Crovella's comment, I just want to provide an example of use of CUDA Thrust's thrust::inclusive_scan for the computation of a cumulative summation.

The example refers to the case when you want to apply rank weigthing to a genetic algorithm. In this case, you rank the chromosomes and set up the following probability

enter image description here

and from that probability distribution you calculate the cumulative probability. See

R.L. Haupt, S.E. Haupt, "Practical Genetic Algorithms, Second Edition", J. Wiley and Sons, pp. 39

#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/constant_iterator.h>
#include <cstdio>

template <class T>
struct scaling {
    const T _a;
    scaling(T a) : _a(a) { }
    __host__ __device__ T operator()(const T &x) const { return _a * x; }
};

void main()
{
   const int N = 20;

   double a = -(double)N;
   double b = 0.;

   double Dx = -1./(0.5*N*(N+1));

   thrust::device_vector<double> MatingProbability(N);
   thrust::device_vector<double> CumulativeProbability(N+1, 0.);

   thrust::transform(thrust::make_counting_iterator(a), thrust::make_counting_iterator(b), MatingProbability.begin(), scaling<double>(Dx));

   thrust::inclusive_scan(MatingProbability.begin(), MatingProbability.end(), CumulativeProbability.begin() + 1);

   for(int i=0; i<N+1; i++) 
   {
      double val = CumulativeProbability[i];
      printf("%d %3.15f\n", i, val);
   }

}

Other examples of cumulative summations regard the numerical calculation of the primitive of a function.

Vitality
  • 20,705
  • 4
  • 108
  • 146