Python multithreading/multiprocessing very slow with concurrent.futures

Question

I am trying to use multithreading and/or multiprocessing to speed up my script somewhat. Essentially I have a list of 10,000 subnets I read in from CSV, that I want to convert into an IPv4 object and then store in an array.

My base code is as follows and executes in roughly 300ms:

aclsConverted = []
def convertToIP(ip):
    aclsConverted.append(ipaddress.ip_network(ip))

for y in acls:
    convertToIP(y['srcSubnet'])

If I try with concurrent.futures Threads it works but is 3-4x as slow, as follows:

aclsConverted = []
def convertToIP(ip):
    aclsConverted.append(ipaddress.ip_network(ip))

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    for y in acls:
        executor.submit(convertToIP,y['srcSubnet'])

Then if I try with concurrent.futures Process it 10-15x as slow and the array is empty. Code is as follows

aclsConverted = []
def convertToIP(ip):
    aclsConverted.append(ipaddress.ip_network(ip))

with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
    for y in acls:
        executor.submit(convertToIP,y['srcSubnet'])

The server I am running this on has 28 physical cores.

Any suggestions as to what I might be doing wrong will be gratefully received!

Memory is not shared between the processes, so each of them will have their own copy of `aclsConverted`. Mutations in the worker processes won't affect the master process. — AKX, Nov 06 '19 at 10:57
Can you increase the number to 26 and see what happens? I feel like the reason you are slowing down is because you are sharing a lot of memory between your processes. If it gets slower, then, its because you're sharing too much memory, and it is being locked. — Games Brainiac, Nov 06 '19 at 10:57
Can you share some of that CSV too? As it is, none of your examples can be easily reproduced. — AKX, Nov 06 '19 at 10:59
Also, unless you're expecting this to grow from 10,000 subnets in 300msec to say, 10,000,000 subnets in 300 seconds, I really wouldn't worry. With small data like that you'll spend more time in serialization/deserialization than doing work. — AKX, Nov 06 '19 at 11:07
Thanks to everyone for replying. I agree looks like threads not appropriate for this instance. My *full* script requires a lot (potentially 100m+) of comparison of IPv4 objects with others, hence why I have been looking at multiprocess. however if there is no memory sharing this will hamper storing of results into a single dict — Andrew Harris, Nov 06 '19 at 11:22
@AndrewHarris What sort of comparisons will you need to be doing? — AKX, Nov 06 '19 at 11:25
Also, note that if you're on a Posix machine (Linux/macOS/...), there will be _one way_ memory sharing as Python forks off the workers when the process pool starts! — AKX, Nov 06 '19 at 11:27
@AKX ultimately the bulk of this script will be looking to see if a list of IPv4 address fall within a list of IPv4 subnets. At the moment it is roughly 100m+ operations, so was looking to see if I could speed it up somewhat — Andrew Harris, Nov 06 '19 at 11:30
@AndrewHarris Sounds like you could explode the network IPs (the 32-bit integers they are) into a relatively compact Bloom filter, and if the Bloom filter matches, follow up with a closer check. Or if the machine has 4+ gigabytes of memory, simply load all IP flags into 2^32 bits of memory and forget about the Bloom filter. — AKX, Nov 06 '19 at 11:34

gelonida · Answer 1 · 2019-11-06T11:36:11.357

2

If tasks are too small, then the overhead of managing multiprocessing / multithreading is often more expensive than the benefit of running tasks in parallel.

You might try following:

Just to create two processes (not threads!!!), one treating the first 5000 subnets, the other the the other 5000 subnets.

There you might be able to see some performance improvement. but the tasks you perform are not that CPU or IO intensive, so not sure it will work.

Multithreading in Python on the other hand will have no performance improvement at all for tasks, that have no IO and that are pure python code.

The reason is the infamous GIL (global interpreter lock). In python you can never execute two python byte codes in parallel within the same process.

Multithreading in python makes still sense for tasks, that have IO (performing network accesses), that perform sleeps, that call modules, that are implemented in C and that do release the GIL. numpy for example releases the GIL and is thus a good candidate for multi threading

edited Nov 06 '19 at 11:36

answered Nov 06 '19 at 11:07

gelonida

5,327
2
23
41

OP is creating processes using `ProcessPoolExecutor`, not threads. – AKX Nov 06 '19 at 11:08
2

The OP triesd both. But my initial comment is still valid. Dispatching very small tasks is not worth the effort. the overhead to dispatch the tasks / collect the results will be a too high penalty. That's why I suggested to try initially with just two tasks each treating 5000 lines within one task, but enven there if the work is really only convert a subnet into an IP object I'm not sure it will be worht the effort – gelonida Nov 06 '19 at 11:10
Multithreading can also be fine for computational tasks if you use libraries that release the GIL (e.g. `numpy` does so). – Giacomo Alzetta Nov 06 '19 at 11:14
@GiacomoAlzetta I wrote that already in my answer. (It seems perhaps not clearly enough), I can add numy as an example perhaps. Or if you have a suggestion to rephrase my answer to make it more obvious I will. I just added numpy as an example – gelonida Nov 06 '19 at 11:35

Python multithreading/multiprocessing very slow with concurrent.futures

1 Answers1