2

I am looking at switching from nvidia to amd for my compute card because I want double precision support. Before doing this I decided to learn opencl on my nvidia card to see if I like it. I want to convert the following code from CUDA to OpenCL. I am using the curand library to generate uniformly and normally distributed random numbers. Each thread needs to be able to create a different sequence of random numbers and generate a few million per thread. Here is the code. How would I go about this in OpenCL. Everything I have read online seems to imply that I should generate a buffer of random numbers and then use that on the gpu but this is not practical for me.

template<int NArgs, typename OptimizationFunctor>
__global__ 
void statistical_solver_kernel(float* args_lbounds, 
                    float* args_ubounds, 
                    int trials,
                    int initial_temp,
                    unsigned long long seed,
                    float* results,
                    OptimizationFunctor f)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if(idx >= trials) 
        return;

    curandState rand;
    curand_init(seed, idx, 0, &rand);
    float x[NArgs];
    for(int i = 0; i < NArgs; i++)
    {
        x[i] = curand_uniform(&rand) * (args_ubounds[i]- args_lbounds[i]) + args_lbounds[i];
    }
    float y = f(x);
    for(int t = initial_temp - 1; t > 0; t--)
    {
        float t_percent = (float)t / initial_temp;
        float x_prime[NArgs];
        for(int i = 0; i < NArgs; i++)
        {
            x_prime[i] = curand_normal(&rand) * (args_ubounds[i] - args_lbounds[i]) * t_percent + x[i];
            x_prime[i] = fmaxf(args_lbounds[i], x_prime[i]);
            x_prime[i] = fminf(args_ubounds[i], x_prime[i]);
        }

        float y_prime = f(x_prime);
        if(y_prime < y || (y_prime - y) / y_prime < t_percent)
        {
            y = y_prime;
            for(int i = 0; i < NArgs; i++)
            {
                x[i] = x_prime[i];
            }
        }
    }   

    float* rptr = results + idx * (NArgs + 1);
    rptr[0] = y;
    for(int i = 1; i <= NArgs; i++)
        rptr[i] = x[i - 1];
}
talonmies
  • 70,661
  • 34
  • 192
  • 269
chasep255
  • 11,745
  • 8
  • 58
  • 115
  • 1
    I'm stuck at the first sentence here. Nvidia supports double precision for over 5 years at least. Which card do you use? Is it really that old? You might add a cuda compiler flag to enable double precision support. - Besides from that, I welcome your choice to also support vendors other than nvidia with your software. ;) Randomness is usually done with a *noise function*, that's a function taking a seed and the thread ID to get a random number for each thread separately. See [this question](http://stackoverflow.com/questions/9912143/how-to-get-a-random-number-in-opencl) as a start. – leemes Mar 03 '16 at 22:43
  • Sorry what I meant is fast double percision. – chasep255 Mar 03 '16 at 22:52
  • You could use a (counter based) random number generator from Boost.Compute or VexCL libraries. – ddemidov Mar 04 '16 at 04:10
  • I used [this RNG named MWC64X](http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html) for replacing curand in my OpenCL Monte-Carlo code. It was lightweight and very effective. IDK if it's the best or anything, but it did the job very nicely for me – Gilles Mar 04 '16 at 04:52

1 Answers1

1

The VexCL library provides an implementation of counter-based generators. You can use those inside larger expressions, see this slide for an example.

EDIT: Take this with a grain of sault, as I am the author of VexCL :).

ddemidov
  • 1,731
  • 13
  • 15