0

I'm running a Monte Carlo code for particle simulation, written in CUDA. Basically, in each step I calculate the velocity of each particle and update its position. The velocity is directly proportional to the path length. For a given material, the path length has a certain distribution. I know the probability density function of this path length. I now try to sample random numbers according to this function via rejection method. I would describe my CUDA knowledge as limited. I understood, that it is preferable to create large chunks of random numbers at once instead of multiple small chunks. However, for the rejection method, I generate only two random numbers, check a certain condition and repeat this procedure in the case of failure. Therefore I generate my random numbers on the kernel.

Using the profiler / nvvp I noticed, that basically 50% of my time is spend during the rejection method.

Here is my question: Are there any ways to optimize the rejection methods?

I appreciate every answer.

CODE

Here is the rejection method.

__global__ void rejectSamplePathlength(float* P, curandState* globalState,
    int numParticles, float sigma, int timestep,curandState state) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numParticles) {
    bool success = false;
    float p;
    float rho1, rho2;
    float a, b;
    a = 0.0;
    b = 10.0;
    curand_init(i, 0, 0, &state);
    while (!success) {
        rho1 = curand_uniform(&globalState[i]);
        rho2 = curand_uniform(&globalState[i]);
        if (rho2 < pathlength(a, b, rho1, sigma)) {
            p = a + rho1 * (b - a);
            success = true;
        }
    }
    P[i] = abs(p);

}
}

The pathlength function in the if statement computes a value y=f(x) on the kernel. I"m pretty sure, that curand_init is problematic in terms of time, but without this statement, each kernel would generate the same numbers?

Community
  • 1
  • 1
  • 1
    From these two posts you can get the answer you are seeking for: [Random generator & CUDA](http://stackoverflow.com/questions/15297168/random-generator-cuda) and [Cuda Random Number Generation](http://stackoverflow.com/questions/15247522/cuda-random-number-generation/15252202#15252202). – Vitality Mar 10 '14 at 10:33

1 Answers1

1

Maybe you could create a pool of random generated uniform variable in a previous kernel and then you pick your uniform in that pool and cycling over that pool. But it should be large enough to avoid infinite loop..

user2076694
  • 806
  • 1
  • 6
  • 10
  • The problem doesn't seem to be the selection of the realization, which the OP seems to have already solved, but the call of `curand_init` by each thread. – Vitality Mar 10 '14 at 12:53