We know that python has a global interpreter lock that can incur a lot of thread contention if we use a threading backend in a Parallel() task in Joblib. However, suppose I made a Cython script:
cdef map[string, int] very_large_map
cdef map[string, int] load_data():
# Some data cleaning...
return very_large_map
def heavy_task(many_lines):
for line in many_lines:
with nogil:
# Heavy tasks on [line]...
# only read from [very_large_map]
return results
def central(total_lines):
global very_large_map
very_large_map = load_data()
all_results = Parallel(n_cpu, require="sharedmem")(delayed(
total_lines[(i * batch_size):((i + 1) * batch_size)]
))
What I am intending to do is to share the very_large_map with different processes/threads so that it won't be copied many times in the memory. In the meantime, since I only read from the very_large_map once it is loaded, I am releasing the GIL using "with nogil", hoping that threads can read from very_large_map concurrently without any contention. However, it turns out that the script is still very slow, and it seems that the GIL is not really released.
Can anyone tell me what is going on behind the "with nogil" and "sharedmem"? If the GIL is really not released, then how can I achieve the concurrent-read of the very_large_map without having to fork it to different processes/threads?
Thanks!