I know you have already accepted an answer, but let me add my "two cents":
The other way of solving your issue is by initializing each process in your pool with the global variable results
as originally intended. The problem was that when using spawn
newly created processes do not inherit the address space of the main process (which included the definition of results
). Instead execution starts from the top of the program. But the code that creates results
never gets executed because of the if __name__ == '__main__'
check. But that is a good thing because you do not want a separate instance of this list anyway.
So how do we share the same instance of global variable results
across all processes? This is accomplished by using a pool initializer as follows. Also, if you want an accurate progress bar, you should really use imap_unordered
instead of imap
so that the progress bar is updated in task-completion order rather than in the order in which tasks were submitted. For example, if the first task submitted happens to be the last task to complete, then using imap
would result in the progress bar not progressing until all the tasks completed and then it would shoot to 100% all at once.
But Note: The doumentation for imap_unordered
only states that the results will be returned in arbitrary order, not completion order. It does however seem that when a chunksize argument of 1 is used (the default if not explicitly specified), the results are returned in completion order. If you do not want to rely on this, then use instead apply_async
specifying a callback function that will update the progrss bar. See the last code example.
import multiprocessing as mp
from multiprocessing import Manager
from tqdm import tqdm
def init_pool(the_results):
global results
results = the_results
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
results.append(arg ** 2)
if __name__ == '__main__':
manager = Manager()
results = manager.list()
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4, initializer=init_pool, initargs=(results,)) as pool:
list(tqdm(pool.imap_unordered(loop, ls), total=len(ls)))
print(results)
Update: Another (Better) Way
import multiprocessing as mp
from tqdm import tqdm
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
return arg ** 2
if __name__ == '__main__':
results = []
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4) as pool:
with tqdm(total=len(ls)) as pbar:
for v in pool.imap_unordered(loop, ls):
results.append(v)
pbar.update(1)
print(results)
Update: The Safest Way
import multiprocessing as mp
from tqdm import tqdm
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
return arg ** 2
def my_callback(v):
results.append(v)
pbar.update(1)
if __name__ == '__main__':
results = []
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4) as pool:
with tqdm(total=len(ls)) as pbar:
for arg in ls:
pool.apply_async(loop, args=(arg,), callback=(my_callback))
pool.close()
pool.join()
print(results)