Joblib Shared Memory With nogil

Question

We know that python has a global interpreter lock that can incur a lot of thread contention if we use a threading backend in a Parallel() task in Joblib. However, suppose I made a Cython script:

cdef map[string, int] very_large_map

cdef map[string, int] load_data():
    # Some data cleaning...
    return very_large_map

def heavy_task(many_lines):
    for line in many_lines:
        with nogil:
            # Heavy tasks on [line]...
            # only read from [very_large_map]
    return results

def central(total_lines):
    global very_large_map
    very_large_map = load_data()
    all_results = Parallel(n_cpu, require="sharedmem")(delayed(
        total_lines[(i * batch_size):((i + 1) * batch_size)]
    ))

What I am intending to do is to share the very_large_map with different processes/threads so that it won't be copied many times in the memory. In the meantime, since I only read from the very_large_map once it is loaded, I am releasing the GIL using "with nogil", hoping that threads can read from very_large_map concurrently without any contention. However, it turns out that the script is still very slow, and it seems that the GIL is not really released.

Can anyone tell me what is going on behind the "with nogil" and "sharedmem"? If the GIL is really not released, then how can I achieve the concurrent-read of the very_large_map without having to fork it to different processes/threads?

Thanks!

I'd be surprised if it wasn't releasing the GIL correctly. It might be worth trying to take the GIL outside the loop if possible, otherwise each thread will have to wait for it once per iteration. — DavidW, Jul 13 '20 at 16:49
Getting it back (at the end of the `with nogil` block) is probably the slow bit (especially since each thread may have to wait for other threads). — DavidW, Jul 13 '20 at 20:02
Btw, does GIL prohibit python from the concurrent read of an object? — Caprikuarius, Jul 14 '20 at 05:05

Joblib Shared Memory With nogil

0 Answers0