2

I'm having problems sharing a dictionary of object instances with multiprocessing. I'm trying to use a dict that is share by a manager, but when I try to use the object instance as a key, it gets copied.

import multiprocessing

class Dog():
    def __init__(self, name = "joe"):
        self.name = name
    def bark(self):
        print("woof")

mg = multiprocessing.Manager()
dt = mg.dict()
dt["a"] = 1
dt["b"] = 2
# As expected
print(dt.items()) # => [('a', 1), ('b', 2)]
dt = mg.dict()
lab = Dog("carl")
print(lab) # => <__main__.Dog instance at 0x7f8d6bb869e0>
dt[lab] = 1
# But then I don't get the ID I expect
print(dt.items()) # => [(<__main__.Dog instance at 0x7f8d6bb86908>, 1)]

I understand the way to work around this is to use object ID as a key, but why is this happening? Is using the object ID the best solution to my problem? I noticed that this doesn't happen with a normal non-manager dict() object.

Alternate approach

In the documentation for Manager(), I read that some of the problem is informing the server of changes, so I changed my code to this, but I still have the same problem where my dogs are copied, not referenced.

import multiprocessing

class Dog():
    def __init__(self, name = "joe"):
        self.name = name
    def bark(self):
        print("woof")

mg = multiprocessing.Manager()
dt = dict()
lp = mg.list()
lp.append(dt)
print(lp)
dt["a"] = 1
dt["b"] = 2
lp[0] = dt
print(lp)
dt = dict()
lab = Dog("carl")
print(lab)
pup = Dog("steve")
print(pup)
dt[lab] = 1
dt[pup] = 2
lp[0] = dt
# Their ids change again
print(lp) 
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Seanny123
  • 8,776
  • 13
  • 68
  • 124

2 Answers2

3

When you create a multiprocessing.Manager, a separate server process is spawned, which is responsible for hosting all the objects created by the Manager. So, in order to store your Dog instance in the Manager dict, the instance needs to be pickled and sent to the Manager process. This, of course, results in an entirely separate Dog instance being created in the Manager process, so its ID won't match the ID of the Dog instance in your parent process. There's no way to avoid this, other than creating the Dog instance as a Proxy instance in the Manager, too:

import multiprocessing
from multiprocessing.managers import SyncManager


def Manager():
    m = SyncManager()
    m.start()
    return m

class Dog():
    def __init__(self, name = "joe"):
        self.name = name
    def bark(self):
        print("woof")

SyncManager.register("Dog", Dog)

mg = Manager()
dt = dict()
lp = mg.list()
lp.append(dt)
print(lp)
dt["a"] = 1 
dt["b"] = 2 
lp[0] = dt
print(lp)
dt = dict()
lab = mg.Dog("carl")
print(lab)
pup = mg.Dog("steve")
print(pup)
dt[lab] = 1 
dt[pup] = 2 
lp[0] = dt
# Their ids don't change
print(lp) 

Output:

<__main__.Dog instance at 0x1780098>
<__main__.Dog instance at 0x177efc8>
[{<__main__.Dog instance at 0x1780098>: 1, <__main__.Dog instance at 0x177efc8>: 2}]

Just keep in mind that this will make all access to your Dog instances in the parent process slower, since they now require IPC calls to the Manager process.

dano
  • 91,354
  • 19
  • 222
  • 219
  • I am facing similar situation with `FastText` but getting `can't pickle fasttext_pybind.fasttext objects`. `FastText` is basically implemented in `C++` (https://fasttext.cc/docs/en/supervised-tutorial.html). The more detail is mentioned in my question: https://stackoverflow.com/q/69430747/6907424 (before applying your approach). Any way to solve it? – hafiz031 Oct 05 '21 at 06:29
2

As the documentation on managers states:

Modifications to mutable values or items in dict and list proxies will not be propagated through the manager, because the proxy has no way of knowing when its values or items are modified. To modify such an item, you can re-assign the modified object to the container proxy

While multiprocessing makes communication between multiple processes easy, it still can't do what the OS doesn't allow (accessing arbitrary memory of another process). In practice, Managers work on copies of the objects, that are serialized when needed.

I understand the way to work around this is to use object ID as a key

Note that you won't be able to get those object instances in another processes. The "proper" way is just to reassign the objects when you change them.

loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • For others finding this answer, there's even an example in the documentation on how to "re-assign the modified object to the container proxy". – Seanny123 Oct 14 '14 at 18:46
  • Unfortunately, doing as recommended in the documentation doesn't solve my problem that when my normal `dict` is added to my `manager.list`, the instances of the objects aren't passed, they're copied, as you can see in this [gist](https://gist.github.com/Seanny123/01d9e92fba631db07d85). – Seanny123 Oct 14 '14 at 18:59