Use Thust OMP to parallelize a Monte Carlo on CPU

Question

The goal is to parallelize a Monte Carlo process using thrust::omp

int main()
{
  unsigned Nsimulations = 1000;
  // construct some objects here that will be required for Monte Carlo
  A a;
  B b;
  C c;

  /*
   * use thrust::omp to run the simulation Nsimulation times
   * and write a custom reduction function (a sum for objects of type R)
   */
}

// this is the Monte Carlo function - it needs the global variables a, b, c
// passed by reference because they are very large; the return type is R
R monteCarlo(A& a, B& b, C& c)
{
   // something supercomplicated here
   return r;
}

I need to know:

if/how can I access the global variables a, b, c (read only so no race condition problems here)
how to set the number of threads (Nsimulations is in the thousands maybe more, so I wouldn't want an overkill for that.
I want to run the monteCarlo function Nsimulation times and possibly store them in a vector and eventually either a thrust reduction or a serial reduction as this is not the time consuming part.

score 1 · Accepted Answer · edited May 23 '17 at 10:24

As I said in your previous question, you're going to have to learn more about thrust in order for answers to questions like this to have any meaning for you.

You could do:

thrust::generate(...);

to generate your initial random numbers. The length of the vector here would be the number of simulations you want.

You could then do:

thrust::for_each(...);

passing the previously generated random number vector as input, possibly zipped with a result vector. The for_each operation would use a custom functor to perform your monteCarlo routine as a separate invocation on each random number element in the input vector. The output of the monteCarlo routine for each input element would go in the corresponding location in the result vector. The parameter/data in A,B,C globals could be passed by reference as initialization parameters to the functor used by for_each.

You could then do:

thrust::reduce(...);

on the previously generate result vector, to create your final reduction.

I wouldn't be concerned about the mapping of simulations you want to do, to OMP threads. Thrust will handle that, just as an #pragma omp parallel for would handle it in the OMP case.

Here's a fully worked example:

$ cat t536.cpp
#include <iostream>
#include <stdlib.h>
#include <thrust/system/omp/execution_policy.h>
#include <thrust/system/omp/vector.h>
#include <thrust/reduce.h>
#include <thrust/for_each.h>
#include <thrust/iterator/zip_iterator.h>


struct A {
  unsigned a;
};

struct B {

  int b;
};

struct C {

  float c;
};

A a;
B b;
C c;

float monteCarlo(int rn){

  return ((rn % a.a)+ b.b)/c.c;
}

struct my_functor
{
  template <typename Tuple>
  void operator()(const Tuple &data) const{

    thrust::get<1>(data) = monteCarlo(thrust::get<0>(data));
   }
};


int main(int argc, char *argv[]){
  a.a = 10;
  b.b = 2;
  c.c = 4.5f;
  unsigned N = 10;
  if (argc > 1) N = atoi(argv[1]);
  thrust::omp::vector<unsigned> rands(N);
  thrust::omp::vector<float> result(N);
  thrust::generate(thrust::omp::par, rands.begin(), rands.end(), rand);
  thrust::for_each(thrust::omp::par, thrust::make_zip_iterator(thrust::make_tuple(rands.begin(), result.begin())), thrust::make_zip_iterator(thrust::make_tuple(rands.end(), result.end())), my_functor());
  float answer = thrust::reduce(thrust::omp::par, result.begin(), result.end());
  std::cout << answer << std::endl;
  return 0;
}
$ g++ -O2 -I/usr/local/cuda/include -o t536 t536.cpp -fopenmp -lgomp
$ ./t536 10
14.8889
$

(the answer should converge within the datatype limits on 1.444*N as N gets large, for these choices of a,b,c)

And as @JaredHoberock mentioned on your previous question, the thrust monte carlo example may be of interest. It demonstrates "fusion" of operations so that a similar activity can be reduced to a single thrust call. The above program could probably be reduced to a single thrust call as well.

I tried to run your code (in VisualStudio/Windows) with latest version of thrust 1.7.1 I get an error at the definition of rands and result vectors: \thrust\system\omp\detail\for_each.inl(53): error C2027: use of undefined type 'thrust::detail::STATIC_ASSERTION_FAILURE' — Boraxis, Aug 21 '14 at 16:50
I would probably need a lot more details about your environment. (32/64 bit OS, windows version, CUDA version (if any), exact project configuration, exact compile command being issued by visual studio) thrust 1.7.1 is not the latest version. The latest version is the 1.8 master branch [on github](https://github.com/thrust/thrust). — Robert Crovella, Aug 21 '14 at 17:46
It was a compilation flag missing. What exactly is the return type of the thrust::for_each? For the regular std::for_each, I usually write: Functor answer = std::for_each(v.begin(), v.end(), functor). It doesn't work here. I see you have a more complicated answer involving tuples and zip_iterators (still working throught it) but I was wondering if there's a simpler formulation. — Boraxis, Aug 21 '14 at 20:35
thrust::for_each and it's return type is documented [here](http://thrust.github.io/doc/group__modifying.html). The return type is not interesting. I don't use it in my code. Do you mean what does thrust::for_each supply to `my_functor`? (it supplies to the functor whatever is produced by dereferencing the input iterator - in this case it is a tuple of one element from `rands` and one element from `result`). Or do you mean what is the functor return type? (it is `void`, also not interesting). — Robert Crovella, Aug 21 '14 at 21:30
I actually implemented it by saving the state into referenced **members&** of the **my_functor**, and using only atomic operations on those members. It works as well. Thanks for the answer. — Boraxis, Aug 22 '14 at 20:22

Use Thust OMP to parallelize a Monte Carlo on CPU

1 Answers1