8

I am writing custom classes that inherit either from the Python class dict, or collections.Counter and I am facing problem with the behaviour of deepcopy. The problem is basically that deepcopy works as intended when inheriting from dict but not from Counter.

Here is an example:

from copy import deepcopy
from collections import Counter

class MyCounter(Counter):
    def __init__(self, foo):
        self.foo = foo

class MyDict(dict):
    def __init__(self, foo):
        self.foo = foo


c = MyCounter(0)
assert c.foo == 0  # Success
c1 = deepcopy(c)
assert c1.foo == 0  # Failure


d = MyDict(0)
assert d.foo == 0  # Success
d1 = deepcopy(d)
assert d1.foo == 0  # Success

I am a bit clueless as to why this is happening given that the source code of the Counter class does not seem to change anything about the deepcopy (no custom __deepcopy__ method for instance).

I understand that I may have to write a custom __deepcopy__ method but it's not clear to me how to. In general I would rather not have to do that given that it works perfectly for dict.

Any help will be much appreciated.

Siolan
  • 91
  • 4

1 Answers1

6

deepcopy has several fallbacks, covered in the answer here

In this case, your particular base class Counter specialize pickle serialization, which is what deepcopy will pick up on (as the second option, as no specialization for __deepcopy__ happens to exist).

If you step through the code in a debugger, you'll find it ends up at Counter's __reduce__ method, where the python 3.9 implementation of Counter has:

    def __reduce__(self):
        return self.__class__, (dict(self),)

where we see can see where information is lost as Counter's implementation here relies on that there aren't any other fields stored in this object other than the dictionary part itself.

You could overload __reduce__ or __reduce_ex__, which would fix pickling and as a bonus also fix deepcopy, or you could overload __deepcopy__ and provide the necessary implementation for it.

Implementing our own deepcopy isn't to complex, and we can keep the code very simple:

class MyCounter(Counter):
    def __init__(self, foo):
        self.foo = foo
        
    def __deepcopy__(self, memo):
        copy_instance = MyCounter(deepcopy(self.foo, memo))
        for key, val in self.items():
            copy_instance[deepcopy(key, memo)] = val  # val is just an int
        return copy_instance

c = MyCounter(123)
c['deep'] = 1
c['copy'] = 2

c1 = deepcopy(c)
assert c1.foo == c.foo
assert c1['deep'] == c['deep']
assert c1['copy'] == c['copy']

(In most cases I would probably recommend against overloading Counter or dict in order to add more attributes to them, but rather compose a custom class that has a counter or dict instance variable instead.)

Mikael Öhman
  • 2,294
  • 15
  • 21
  • Was 'meh' about Your answer until I saw 'prefer composition over inheritance'. +1 – Vorac Jul 20 '23 at 09:38
  • That's a very helpful answer! Thanks a lot, I did not get the link between deepcopy and pickle. About composition vs inheritance, maybe I should have gone that way from the beginning indeed. – Siolan Jul 20 '23 at 12:45
  • @Siolan The same way `str(x)` will use `__str__` if it exists and fallback to `__repr__`, `deepcopy` looks for `__deepcopy__` but falls back to `__reduce__/__reduce_ex__` as a second option and use that to serialize + deserialize the object (this is what pickle uses), which will work as a deep copy but might be less efficient. Even if you implement `__deepcopy__` for your custom class, it would still fail to pickle properly, unless you also fix `__reduce__`. – Mikael Öhman Jul 20 '23 at 14:25
  • I see, I did not know the link between `deepcopy` and `__reduce__`. Well I've now implemented the reduce method and all works perfectly! Thanks :) – Siolan Jul 22 '23 at 09:33