0

I have a list of elements which I am processing in a multiprocessing apply_async task and updating elements processed one by one with a key in manager dict on which I want to map whole list.

I tried following code:

#!/usr/bin/python

from multiprocessing import Pool, Manager

def spammer_task(d, my_list):
    #Initialize manager dict
    d['task'] = {
        'processed_list': []
    }

    for ele in my_list:
        #process here
        d['task']['processed_list'].append(ele)

    return

p = Pool()
m = Manager()
d = m.dict()

my_list = ["one", "two", "three"]

p.apply_async(spammer_task (d, my_list))
print d

At the end it simply posts empty list in dict. Output:

{'task': {'processed_list': []}}

Now after researching a bit, I got to know that elements inside manager dict become immutable so you have to re-initialize whole dict with new data in order to update it. SO i tried following code and it gives a weird error.

#!/usr/bin/python

from multiprocessing import Pool, Manager

def spammer_task(d, my_list):
    #Initialize manager dict
    d['task'] = {
        'processed_list': []
    }

    for ele in my_list:
        #process here
        old_list = d['task']['processed_list']
        new_list = old_list.append(ele)
        #Have to do it this way since elements inside a manager dict become
        #immutable so
        d['task'] = {
            'processed_list': new_list
        }

    return

p = Pool()
m = Manager()
d = m.dict()

my_list = ["one", "two", "three"]

p.apply_async(spammer_task (d, my_list))
print d

Output:

Traceback (most recent call last): File "./a.py", line 29, in p.apply_async(spammer_task (d, my_list)) File "./a.py", line 14, in spammer_task new_list = old_list.append(ele) AttributeError: 'NoneType' object has no attribute 'append'

Somehow it seems to be appending None to the list which I cant figure out why.

MohitC
  • 4,541
  • 2
  • 34
  • 55
  • maybe it is just a stupid question but to me it looks like your example would be better off with using imap - why do you use apply_async instead? – janbrohl Aug 01 '16 at 16:53
  • This was just a sample program, main one uses apply_async for some stuff it is doing. Moreover it is calling multiple processes with it – MohitC Aug 01 '16 at 17:33
  • To say it more exactly - I meant using Pool.imap for having multiple processes and modifying the dict in main process as this should not be computationally expensive. This seems more sensible to me than making lots of copies and having extra syncing – janbrohl Aug 02 '16 at 09:00
  • please show a snippet with poc janbrohi – MohitC Aug 02 '16 at 09:29

2 Answers2

1

Accoridng to solution at https://bugs.python.org/issue6766

Following code fixes it, by copying whole task dict and then modifying it and recopying it

#!/usr/bin/python

from multiprocessing import Pool, Manager

def spammer_task(d, my_list):
    #Initialize manager dict
    d['task'] = {
        'processed_list': []
    }

    for ele in my_list:
        #process here
        foo = d['task']
        foo['processed_list'].append(ele)
        d['task'] = foo
    return

p = Pool()
m = Manager()
d = m.dict()

my_list = ["one", "two", "three"]

p.apply_async(spammer_task (d, my_list))
print d

Output:

{'task': {'processed_list': ['one', 'two', 'three']}}

MohitC
  • 4,541
  • 2
  • 34
  • 55
1

Apart from making sure that d actually contains something when printed, the result is still {'task': {'processed_list': ['one', 'two', 'three']}}

#!/usr/bin/python

from multiprocessing import Pool

def spammer_task(my_list):
    #Initialize manager dict
    out= {
        'processed_list': []
    }

    for ele in my_list:
        #process here
        out['processed_list'].append(ele)

    return 'task',out



my_list = ["one", "two", "three"]

if __name__=="__main__":

    p = Pool()
    d=dict(p.imap_unordered(spammer_task, [my_list])) #this line blocks until finished
    print d
janbrohl
  • 2,626
  • 1
  • 17
  • 15