1

The code i'm trying to parallelize in open mp is a Monte Carlo that boils down to something like this:

int seed = 0;
std::mt19937 rng(seed); 
double result = 0.0;
int N = 1000;

#pragma omp parallel for
for(i=0; x < N; i++)
{
    result += rng()
}
std::cout << result << std::endl;

I want to make sure that the state of the random number generator is shared across threads, and the addition to the result is atomic.

Is there a way of replacing this code with something from thrust::omp. From the research that I did so far it looks like thrust::omp is more of a directive to use multiple CPU threads rather than GPU for some standard thrust operations.

Boraxis
  • 89
  • 1
  • 9
  • Do you want to run the code on the GPU? Or just on the CPU? The thrust OMP [device backend](https://github.com/thrust/thrust/wiki/Device-Backends) does not use the GPU. Instead of "multiple CPU threads rather than **CPU**" did you intend to say "multiple CPU threads rather than **GPU**" ? – Robert Crovella Aug 14 '14 at 15:21
  • I edited the question. Your observation is correct, my calculations are too complicated to run on GPU, instead I want to run on multiple threads. – Boraxis Aug 14 '14 at 16:22

1 Answers1

2

Yes, it's possible to use thrust to do something similar, with (parallel) execution on the host CPU using OMP threads underneath the thrust OMP backend. Here's one example:

$ cat t535.cpp
#include <random>
#include <iostream>
#include <thrust/system/omp/execution_policy.h>
#include <thrust/system/omp/vector.h>
#include <thrust/reduce.h>

int main(int argc, char *argv[]){
  unsigned N = 1;
  int seed = 0;
  if (argc > 1)  N = atoi(argv[1]);
  if (argc > 2)  seed = atoi(argv[2]);
  std::mt19937 rng(seed);
  unsigned long result = 0;

  thrust::omp::vector<unsigned long> vec(N);
  thrust::generate(thrust::omp::par, vec.begin(), vec.end(), rng);
  result = thrust::reduce(thrust::omp::par, vec.begin(), vec.end());
  std::cout << result << std::endl;
  return 0;
}
$ g++ -std=c++11 -O2 -I/usr/local/cuda/include -o t535 t535.cpp -fopenmp -lgomp
$ time ./t535 100000000
214746750809749347

real    0m0.700s
user    0m2.108s
sys     0m0.600s
$

For this test I used Fedora 20, with CUDA 6.5RC, running on a 4-core Xeon CPU (netting about a 3x speedup based on time results). There are probably some further "optimizations" that could be made for this particular code, but I think they will unnecessarily clutter the idea, and I assume that your actual application is more complicated than just summing random numbers.

Much of what I show here was lifted from the thrust direct system access page but there are several comparable methods to access the OMP backend, depending on whether you want to have a flexible, retargettable code, or you want one that specifically uses the OMP backend (this one specifically targets OMP backend).

The thrust::reduction operation guarantees the "atomicity" you are looking for. Specifically, it guarantees that two threads are not trying to update a single location at the same time. However the use of std::mt19937 in a multithreaded OMP app is outside the scope of my answer, I think. If I create an ordinary OMP app using the code you provided, I observe variability in the results due (I think) to some interaction between the use of the std::mt19937 rng in multiple OMP threads. This is not something thrust can sort out for you.

Thrust also has random number generators, which are designed to work with it.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • That looks like a very neat solution; I'll have to adapt my code as the random component is more complicated that just rand(), the result is a matrix (I'd have to write my own reduction). I also need access to some global variables (declared outside the threads). For the sake of the example I would have to add to each random number a value indexed in an array of length N declared in main(). How would that work? Also, is there a way of controlling the number of threads? – Boraxis Aug 14 '14 at 18:45
  • You can control the number of threads used by any OMP program using the OMP_NUM_THREADS environment variable, and probably there are other methods, although I don't know of a thrust way to do it. It's not going to be possible to answer your other question in the space of comments. Thrust can add one vector to another in a variety of ways, one being `thrust::transform`. If you want to make use of thrust, you're going to have to learn the basics of the algorithms and datatypes it uses. I'd suggest starting with the [quick start guide](https://github.com/thrust/thrust/wiki/Quick-Start-Guide). – Robert Crovella Aug 14 '14 at 18:52
  • Also have a look at Thrust's [monte carlo example program](https://github.com/thrust/thrust/blob/master/examples/monte_carlo.cu). – Jared Hoberock Aug 15 '14 at 09:07