3

I'm trying to use openmp in cython. I need to do two things in cython:

i) use the #pragma omp single{} scope in my cython code.

ii) use the #pragma omp barrier{}

Does anyone know how to do this in cython?

Here are more details. I have a nogil cdef-function my_fun() which I call in an omp for-loop:

from cython.parallel cimport prange
cimport openmp

cdef int i

with nogil:
    for i in prange(10,schedule='static', num_threads=10):
        my_func(i)

Inside my_func I need to place a barrier to wait for all threads to catch up, then execute a time-consuming operation only in one of the threads and with the gil acquired, and then release the barrier so all threads resume simultaneously.

cdef int my_func(...) nogil:

    ...

    # put a barrier until all threads catch up, e.g. #pragma omp barrier

    with gil:
        # execute time consuming operation in one thread only, e.g. pragma omp single{}

    # remove barrier after the above single thread has finished and continue the operation over all threads in parallel, e.g. #pragma omp barrier

    ...


python_freak
  • 259
  • 3
  • 12

1 Answers1

6

Cython has some support for openmp, but it is probably easier to code in C and to wrap resulting code with Cython if openmp-pragmas are used extensively.


As alternative, you could use verbatim-C-code and tricks with defines to bring some of the functionality to Cython, but using of pragmas in defines isn't straight forward (_Pragma is a C99-solution, MSVC doing its own thing as always with __pragma), there are some examples as proof of concept for Linux/gcc:

cdef extern from *:
    """
    #define START_OMP_PARALLEL_PRAGMA() _Pragma("omp parallel") {
    #define END_OMP_PRAGMA() }
    #define START_OMP_SINGLE_PRAGMA() _Pragma("omp single") {
    #define START_OMP_CRITICAL_PRAGMA() _Pragma("omp critical") {   
    """
    void START_OMP_PARALLEL_PRAGMA() nogil
    void END_OMP_PRAGMA() nogil
    void START_OMP_SINGLE_PRAGMA() nogil
    void START_OMP_CRITICAL_PRAGMA() nogil

we make Cython believe, that START_OMP_PARALLEL_PRAGMA() and Co. are nogil-function, so it put them into C-code and thus they get pick up by the preprocessor.

We must use the syntax

#pragma omp single{
   //do_something
}

and not

#pragma omp single
do_something

because of the way Cython generates C-code.

The usage could look as follows (I'm avoiding here from cython.parallel.parallel as it does too much magic for this simple example):

%%cython -c=-fopenmp --link-args=-fopenmp
cdef extern from *:# as listed above
    ...

def test_omp():
    cdef int a=0
    cdef int b=0  
    with nogil:
        START_OMP_PARALLEL_PRAGMA()
        START_OMP_SINGLE_PRAGMA()
        a+=1
        END_OMP_PRAGMA()
        START_OMP_CRITICAL_PRAGMA()
        b+=1
        END_OMP_PRAGMA() # CRITICAL
        END_OMP_PRAGMA() # PARALLEL
    print(a,b)

Calling test_omp prints "1 2" on my machine with 2 threads, as expected (one could change the number of threads using openmp.omp_set_num_threads(10)).

However, the above is still very brittle - some error checking by Cython can lead to invalid code (Cython uses goto for control flow and it is not possible to jump out of openmp-block). Something like this happens in your example:

cimport numpy as np
import numpy as np
def test_omp2():
    cdef np.int_t[:] a=np.zeros(1,dtype=int)

    START_OMP_SINGLE_PRAGMA()
    a[0]+=1
    END_OMP_PRAGMA()

    print(a)

Because of bounding checking, Cython will produce:

START_OMP_SINGLE_PRAGMA();
...
//check bounds:
if (unlikely(__pyx_t_6 != -1)) {
    __Pyx_RaiseBufferIndexError(__pyx_t_6);
    __PYX_ERR(0, 30, __pyx_L1_error)  // HERE WE GO A GOTO!
}
...
END_OMP_PRAGMA();

In this special case setting boundcheck to false, i.e.

cimport cython
@cython.boundscheck(False) 
def test_omp2():
   ...

would solve the issue for the above example, but probably not in general.

Once again: using openmp in C (and wrapping the functionality with Cython) is a more enjoyable experience.


As a side note: Python-threads (the ones governed by GIL) and openmp-threads are different and know nothing about eachother. The above example would also work (compile and run) correctly without releasing the GIL - openmp-threads do not care about GIL, but as there are no Python-objects involved nothing can go wrong. Thus I have added nogil to the wrapped "functions", so it can also be used in nogil blocks.

However, when code gets more complicated it becomes less obvious, that the variables shared between different Python-threads aren't accessed (all above because those accesses could happen in the generated C-code and this doesn't become clear from the Cython-code), it might be wiser not to release gil, while using openmp.

ead
  • 32,758
  • 6
  • 90
  • 153