0

I'm using ThreadPool from multiprocessing.pool but it seems it is not working (i.e. it is executed sequentially rather than in parallel: res1 takes 14s, res2 takes 11s, and using ThreadPool takes 27s instead of 14s).

I think the problem is in another_method because it uses a (shared) read_only_resource.

I've tried having a time.sleep(val) in another_method instead of calling another method (works as expected - it takes as long as the maximum value I pass) and I've also tried to pass a deep copy of the read_only_resource (this doesn't work, it takes 27s).

I have run out of things to try to make this work:

def method(text_type, read_only_resource):
  value = some_processesing(text_type)
  return another_method(value, read_only_resource)


def main(): 
  same_read_only_resource = get_read_only_resource()
  pool = ThreadPool(processes=2)
  res1 = pool.apply_async(method, (some_text_type, same_read_only_resource))
  res2 = pool.apply_async(method, (other_text_type, same_read_only_resource))

  results1 = res1.get()
  results2 = res2.get()
daria
  • 101
  • 2
  • 9

1 Answers1

0

It looks like you want to use map instead of apply_async. The apply_async function isn't for parallelizing multiple function calls. Rather, it's to asynchronously call a single instance of the function. Since you call it twice and get the results in order, you get serialized performance.

Calling map will run multiple instances of a function in parallel. It requires packing the inputs into a single object, e.g. a tuple, since it only allows a single argument to be passed to your function. Then all packed inputs can be placed in a list or other iterable and given to map, for example:

work_args =[(some_text, read_only_resource), (other_text, read_only_resource), ... ]
results = pool.map(method, work_args)

Additionally you can use something like itertools.izip() to create work_args, e.g.:

work_args = itertools.izip([some_text, other_text, ...], read_only_resource)

Note that since you are using ThreadPool, you still might not get much performance increase, depending on the work being done. The topic has been discussed in many places, but Python may not parallelize with threads like you expect, due to the Global Interpreter Lock. See here for a good summary. In short, if your function is going to do I/O, ThreadPool can help.

However if you use multiprocessing.Pool, you will have multiple processes executing simultaneously. Just replace ThreadPool with Pool to use it.

ahota
  • 439
  • 5
  • 16