2

I have a C++ code that runs in parallel with OpenMP, performing some long calculations. This part works great.

Now, I'm using Python to make a GUI around this code. So, I'd like to call my C++ code inside my python program. For that, I use Pybind11 (but I guess I could use something else if needed).

The problem is that when called from Python, my C++ code runs in serial with only one thread/CPU.

I tried (in two ways) to understand what is done in the documentation of pybind11 here but it does not seem to work at all.

My binding looks like that :

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "../cpp/include/myHeader.hpp"
namespace py = pybind11;

PYBIND11_MODULE(my_module, m) {
    m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());

    m.def("testFunction2", [](inputType input) -> outputType {
        /* Release GIL before calling into (potentially long-running) C++ code */
        py::gil_scoped_release release;
        outputType output =  testFunction(input);
        py::gil_scoped_acquire acquire;

        return output;
    });
}

Problem: This still does not work and uses only one thread (I verify that with a print of omp_get_num_threads() in an omp parallel region).

Question: What am I doing wrong? What do I need to do to be able to use parallel C++ code inside Python?

Disclaimer: I must admit I don't really understand the GIL thing, particularly in my case where I do not use Python inside my C++ code, which is really "independent" in theory. I just want to be able to use it in another (Python) code.

Have a great day.

EDIT : I have solved my problem thanks to the pptaszni's answer. Indeed, the GIL things are not needed at all, I misunderstood the documentation. pptaszni's code worked and in fact it was a problem with my CMake file. Thank you.

Naomi
  • 45
  • 6
  • 1
    The GIL should not affect the OpenMP runtime, as each one is unaware of each other. If you'd somehow end up calling the GIL in your parallel section, you'd deadlock, as Pydind11 is holding the lock. Most likely, `OMP_NUM_THREADS` is set to `1` or not set (although I'd expect to pick up whatever `nproc` returns). You can try to set the number of threads in `testFunction()` using `omp_set_num_threads()` (https://www.openmp.org/spec-html/5.0/openmpsu110.html). – ipapadop Apr 13 '21 at 15:12

1 Answers1

1

It's not really a good answer (too long for a comment thought), because I did not reproduce your problem, but maybe you can isolate the issue in your code by trying this example that works for me:

C++ code:

#include "OpenMpExample.hpp"

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>

#include <omp.h>

constexpr int DATA_SIZE = 10000000;

std::vector<int> testFunction()
{
  int nthreads = 0, tid = 0;
  std::vector<std::vector<int> > data;
  std::vector<int> results;
  std::random_device rnd_device;
  std::mt19937 mersenne_engine {rnd_device()};
  std::uniform_int_distribution<int> dist {-10, 10};
  auto gen = [&dist, &mersenne_engine](){ return dist(mersenne_engine); };

  #pragma omp parallel private(tid)
  {
    tid = omp_get_thread_num();
    if (tid == 0)
    {
      nthreads = omp_get_num_threads();
      std::cout << "Num threads: " << nthreads << std::endl;
      data.resize(nthreads);
      results.resize(nthreads);
    }
  }
  
  #pragma omp parallel private(tid) shared(data, gen)
  {
    tid = omp_get_thread_num();
    data[tid].resize(DATA_SIZE);
    std::generate(data[tid].begin(), data[tid].end(), gen);
  }
  #pragma omp parallel private(tid) shared(data, results)
  {
    tid = omp_get_thread_num();
    results[tid] = std::accumulate(data[tid].begin(), data[tid].end(), 0);
  }
  for (auto r : results)
  {
    std::cout << r << ", ";
  }
  std::cout << std::endl;
  return results;
}

I tried to keep the code short, but force the machine to actually do some computations at the same time. Each thread generates 10^7 random integers and then sums them up. Then the python binding does not even require gil_scoped_release:

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "OpenMpExample.hpp"
namespace py = pybind11;

// both versions work for me
// PYBIND11_MODULE(mylib, m) {
//     m.def("testFunction", &testFunction, py::call_guard<py::gil_scoped_release>());
// }

PYBIND11_MODULE(mylib, m) {
    m.def("testFunction", &testFunction);
}

Example output from python:

Python 3.6.8 (default, Jun 29 2020, 16:38:14) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mylib
>>> x = mylib.testFunction()
Num threads: 12
-10975, -22101, -11333, -28603, -471, -15505, -18141, 2887, -6813, -5328, -13975, -4321, 

My environment: Ubuntu 18.04.3 LTS, gcc 8.4.0, openMP 201511, python 3.6.8;

pptaszni
  • 5,591
  • 5
  • 27
  • 43