3

I have a class A that when initiated changes a mutable class attribute nums.

when initiating the class via a Process pool with maxtasksperchild= 1, I notice that nums has the values of several different processes. which is an undesirable behavior for me.

my questions are:

  • are the processes sharing memory ?
  • am i not understanding maxtasksperchild and the workings of a Process pool correctly ?

EDIT: I am guessing that that the pool pickles the previous processes it started (and not the original one) and thus saving the values of nums, is that correct? and if so, how can i force it to use the original process?

here is an example code:

from multiprocessing import Pool


class A:
    nums = []

    def __init__(self, num=None):
        self.__class__.nums.append(num)  # I use 'self.__class__' for the sake of explicitly
        print(self.__class__.nums)
        assert len(self.__class__.nums) < 2  # checking that they don't share memory


if __name__ == '__main__':
    with Pool(maxtasksperchild=1) as pool:
        pool.map(A, range(99))  # the assert is being raised

EDIT because of answer by k.wahome: using instance attributes doesn't answer my question I need to use class attributes because in my original code (not shown here) i have several instances per process. my question is specifically about the workings of a multiprocessing pool.


btw, doing the following does work

from multiprocessing import Process

if __name__ == '__main__':
    prs = []
    for i in range(99):
        pr = Process(target=A, args=[i])
        pr.start()
        prs.append(pr)
    [pr.join() for pr in prs]
# the assert was not raised
Darkonaut
  • 20,186
  • 7
  • 54
  • 65
moshevi
  • 4,999
  • 5
  • 33
  • 50

2 Answers2

0

The sharing is most likely coming in via the mapped class A with a class attribute nums.

Class attributes are class bound thus belong to the class itself, are created when the class is loaded and they will be shared by all the instances. All objects will have the same memory reference to a class attribute.

Unlike class attributes, instance attributes are instance bound and not shared by various instances. Every instance has its own copy of the instance attribute.

See the class vs instance attribute effect:

1. Using nums as a class attribute class_num.py

from multiprocessing import Pool


class A:
nums = []

def __init__(self, num=None):
    # I use 'self.__class__' for the sake of explicitly
    self.__class__.nums.append(num)
    print("nums:", self.__class__.nums)
    # checking that they don't share memory
    assert len(self.__class__.nums) < 2


if __name__ == '__main__':
with Pool(maxtasksperchild=1) as pool:
    print(pool)
    pool.map(A, range(99))  # the assert is being raised

Running this script

>>> python class_num.py
nums: [0]
nums: [0, 1]
nums: [4]
nums: [4, 5]
nums: [8]
nums: [8, 9]
nums: [12]
nums: [12, 13]
nums: [16]
nums: [16, 17]
nums: [20]
nums: [20, 21]
nums: [24]
nums: [24, 25]
nums: [28]
nums: [28, 29]
nums: [32]
nums: [32, 33]
nums: [36]
nums: [36, 37]
nums: [40]
nums: [40, 41]
nums: [44]
nums: [44, 45]
nums: [48]
nums: [48, 49]
nums: [52]
nums: [52, 53]
nums: [56]
nums: [56, 57]
nums: [60]
nums: [60, 61]
nums: [64]
nums: [64, 65]
nums: [68]
nums: [68, 69]
nums: [72]
nums: [72, 73]
nums: [76]
nums: [76, 77]
nums: [80]
nums: [80, 81]
nums: [84]
nums: [84, 85]
nums: [88]
nums: [88, 89]
nums: [92]
nums: [92, 93]
nums: [96]
nums: [96, 97]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "class_num.py", line 12, in __init__
    assert len(self.__class__.nums) < 2
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "class_num.py", line 18, in <module>
    pool.map(A, range(99))  # the assert is being raised
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
AssertionError

2. Using nums as an instance attribute instance_num.py

from multiprocessing import Pool


class A:

    def __init__(self, num=None):
        self.nums = []
        if num is not None:
            self.nums.append(num)
        print("nums:", self.nums)
        # checking that they don't share memory
        assert len(self.nums) < 2


if __name__ == '__main__':
    with Pool(maxtasksperchild=1) as pool:
        pool.map(A, range(99))  # the assert is being raised

Running this script

>>> python instance_num.py
nums: [0]
nums: [1]
nums: [2]
nums: [3]
nums: [4]
nums: [5]
nums: [6]
nums: [7]
nums: [8]
nums: [9]
nums: [10]
nums: [11]
nums: [12]
nums: [13]
nums: [14]
nums: [15]
nums: [16]
nums: [17]
nums: [18]
nums: [19]
nums: [20]
nums: [21]
nums: [22]
nums: [23]
nums: [24]
nums: [25]
nums: [26]
nums: [27]
nums: [28]
nums: [29]
nums: [30]
nums: [31]
nums: [32]
nums: [33]
nums: [34]
nums: [35]
nums: [36]
nums: [37]
nums: [38]
nums: [39]
nums: [40]
nums: [41]
nums: [42]
nums: [43]
nums: [44]
nums: [45]
nums: [46]
nums: [47]
nums: [48]
nums: [49]
nums: [50]
nums: [51]
nums: [52]
nums: [53]
nums: [54]
nums: [55]
nums: [56]
nums: [57]
nums: [58]
nums: [59]
nums: [60]
nums: [61]
nums: [62]
nums: [63]
nums: [64]
nums: [65]
nums: [66]
nums: [67]
nums: [68]
nums: [69]
nums: [70]
nums: [71]
nums: [72]
nums: [73]
nums: [74]
nums: [75]
nums: [76]
nums: [77]
nums: [78]
nums: [79]
nums: [80]
nums: [81]
nums: [82]
nums: [83]
nums: [84]
nums: [85]
nums: [86]
nums: [87]
nums: [88]
nums: [89]
nums: [90]
nums: [91]
nums: [92]
nums: [93]
nums: [94]
nums: [95]
nums: [96]
nums: [97]
nums: [98]
k.wahome
  • 962
  • 5
  • 14
  • i understand the difference between class attributes and instance attributes, however this doesn't answer my question. I need to use class attributes because in my original code (not shown here) i have several instances per process. – moshevi Aug 26 '18 at 11:09
  • my question is specifically about the workings of a multiprocessing pool. – moshevi Aug 26 '18 at 11:10
  • Oh I see. You get a pool of worker processes from `Pool()`. `map` allows you to achieve execution and data parallelism by essentially applying a function to each element in an iterable and returning the results. So you supply to it the function and the arguments and each child process spawned in the pool will be able to successfully import. – k.wahome Aug 26 '18 at 11:29
0

Your observation has another reason. The values in nums are not from other processes but from the same process when it starts hosting multiple instances of A. This happens because you didn't set chunksize to 1 in your pool.map-call. Setting maxtasksperchild=1 is not enough in your case because one task still consumes a whole chunk of the iterable.

This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer. docs about map

Darkonaut
  • 20,186
  • 7
  • 54
  • 65