What is the easiest way to parallelize a recursive code in Cython

Question

Consider a recursive code in Cython of the following generic form:

cpdef function(list L1, list L2):
    global R
    cdef int i,n #...
    cdef list LL1,LL2 #...
    # ...
    # core of the code
    # ...
    n= #...
    for i in range(n):
        LL1= #...
        LL2= #...
        function(LL1,LL2)

New remark: my relevant code is just a tree exploration collecting fruits, all the branchs are independant. Consider a computer with several CPUs, I would like to parallelize as follows: each CPU has a queue, when the code arrives to a new node of the tree, there are several possible new children, and a child is allocated to the CPU with the smallest queue. It seems to be a generic way to parallelize a tree exploration.

Question: What is the easiest way to implement such a parallelization?

I tried to precede my code by from cython.parallel import prange and then to replace range(n) by prange(n) but I got the error:

prange() can only be used without the GIL

Then I replaced prange(n) by prange(n,nogil=True) but I got many errors like:

Assignment of Python object not allowed without gil
Coercion from Python not allowed without the GIL
Indexing Python object not allowed without gil
Calling gil-requiring function not allowed without gil

Below is the relevant code I want to parallelize:

cpdef SmithFormIntegralPointsSuperFiltred(list L, list LL, list co, list A):
    global R,clp
    cdef int i,j,k,l,ll,p,a,c,cc,rc,m,f,b,z,zz,lp,s,la,kk,ccc,zo,jj,lM
    cdef list LB,S,P,CP,F,cco,PP,PPP,coo,V,LLP,LLPO,Mi,M
    m=10000
    l=len(L)
    ll=len(LL)
    la=len(A[0])
    z=0
    zz=0
    P=[]
    for i in range(l):
        if L[i]==-1:
            P.append(i)
    lp=len(P)
    if lp<clp:
        print([lp,L])
        clp=lp
    if lp==0:
        F=list(matrix(LL)*vector(L))
        b=0
        for f in F:
            if f<0:
                b=1
                break
        if b==0:    
            R.append(F); print(L)
    if lp>0:
        PP=[m for j in range(lp)]
        PPP=[[] for j in range(lp)]
        for i in range(ll):
            a=0
            for j in P:
                if LL[i][j]>0:
                    a+=1
                    if a==2:
                        break
            if a<=1:
                CP=list(set(range(l))-set(P))
                c=sum([LL[i][j]*L[j] for j in CP])
                if a==0 and c<0:
                    z=1
                    break
                if a==1 or (a==0 and c>=0):
                    LLPO=[LL[i][P[k]] for k in range(lp)]
                    for j in range(lp):
                        LLP=LLPO[:]
                        cc=-LLP[j]
                        if cc<>0:
                            del LLP[j]
                            if LLP==[0 for k in range(lp-1)]:
                                PPP[j].append(i)
                            zz=1
                            if cc>0:
                                rc=c/cc
                                if rc<PP[j]:
                                    PP[j]=rc
        if z==0 and zz==1:
            zo=0
            for i in range(lp):
                Mi=[]
                if PPP[i]<>[]:
                    for j in range(PP[i]+1):
                        ccc=0
                        coo=copy.deepcopy(co)
                        for k in PPP[i]:
                            s=sum([LL[k][kk]*L[kk] for kk in range(l)])+(j+1)*LL[k][P[i]]
                            V=A[k]
                            for kk in range(la):
                                if V[kk]<>0:
                                    if s>=0 and coo[kk][V[kk]]>=s:
                                        coo[kk][V[kk]]-=s
                                    else:
                                        ccc=1
                                        break
                            if ccc==1:
                                break
                        if ccc==0:
                            Mi.append(j)
                    if len(Mi)<m:
                        zo=1
                        m=len(Mi)
                        M=Mi
                        p=i
            if zo==1:
                M.reverse()
                lM=len(M)                                       
                for jj in range(lM):
                    j=M[jj]
                    cco=copy.deepcopy(co)
                    for k in PPP[p]:
                        s=sum([LL[k][kk]*L[kk] for kk in range(l)])+(j+1)*LL[k][P[p]]
                        V=A[k]
                        for kk in range(la):
                            if V[kk]<>0:
                                cco[kk][V[kk]]-=s
                    LB=L[:]
                    LB[P[p]]=j
                    SmithFormIntegralPointsSuperFiltred(LB,LL,cco,A)

The global variables R and clp are not essential, I can manage without global variable if necessary.

The problem isn't really that the function is recursive - it's that the function operates on Python objects (lists) and so requires the GIL. It isn't obvious that there's any code here that can work without the GIL. — DavidW, May 22 '19 at 05:12
@DavidW: Is there a way to bypass this GIL problem in Python or should I try with another language programming? — Sebastien Palcoux, May 22 '19 at 07:00
If the lists are all of a single type (e.g. int) then you can use Cython memoryviews instead (pass them Numpy arrays). It's not a problem if some of your code needs the GIL, but you do want a good portion not to need it. The other thing that worries me slightly is the global variable - unless it's a constant that (probably) won't work well in parallel — DavidW, May 22 '19 at 07:15
@SebastienPalcoux can you show us the relevant code? This is quite hard to determine if you can trasnform the code to a nogil code — BlueSheepToken, May 22 '19 at 08:19
The global variables will definitely be a problem - you're changing them, and the thing with parallel code is that you don't know what order it'll run in, so the answer you get will probably vary. in general it doesn't look easy to parallelise in Python - there's a lot of list operations that will require the GIL and don't look easy to replace — DavidW, May 22 '19 at 11:42
@DavidW: The global variables are not essential for the code, I can manage without them. About the rest, if it is hard to transform into a nogil code, would it be easier to use `Pool` from `multiprocessing` (I found that by browsing the web but I did not yet understand how it works). — Sebastien Palcoux, May 22 '19 at 11:57
@BlueSheepToken: I got your point but it is OK, the code is just a tree exploration looking for fruits, the branchs are independent. The global variables are just used to amass the fruits. I can just print a fruit each time I find one and it will be OK. — Sebastien Palcoux, May 22 '19 at 12:12
I think `Pool` from `multiprocessing` would probably be the best thing to try here — DavidW, May 22 '19 at 12:26
Out of curiosity, is there anywhere we can read up online on what this exact algorithm is that the function is trying to represent? Also, some sample input and expected output could be potentially helpful to reason about the code. — CodeSurgeon, May 24 '19 at 03:12

What is the easiest way to parallelize a recursive code in Cython

0 Answers0