2

I extended dict in a simple way to directly access it's values with the d.key notation instead of d['key']:

class ddict(dict):

    def __getattr__(self, item):
        return self[item]

    def __setattr__(self, key, value):
        self[key] = value

Now when I try to pickle it, it will call __getattr__ to find __getstate__, which is neither present nor necessary. The same will happen upon unpickling with __setstate__:

>>> import pickle
>>> class ddict(dict):
...     def __getattr__(self, item):
...         return self[item]
...     def __setattr__(self, key, value):
...         self[key] = value
...
>>> pickle.dumps(ddict())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __getattr__
KeyError: '__getstate__'

How do I have to modify the class ddict in order to be properly pickable?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Michael
  • 7,407
  • 8
  • 41
  • 84

2 Answers2

8

The problem is not pickle but that your __getattr__ method breaks the expected contract by raising KeyError exceptions. You need to fix your __getattr__ method to raise AttributeError exceptions instead:

def __getattr__(self, item):
    try:
        return self[item]
    except KeyError:
        raise AttributeError(item)

Now pickle is given the expected signal for a missing __getstate__ customisation hook.

From the object.__getattr__ documentation:

This method should return the (computed) attribute value or raise an AttributeError exception.

(bold emphasis mine).

If you insist on keeping the KeyError, then at the very least you need to skip names that start and end with double underscores and raise an AttributeError just for those:

def __getattr__(self, item):
    if isinstance(item, str) and item[:2] == item[-2:] == '__':
        # skip non-existing dunder method lookups
        raise AttributeError(item)
    return self[item]

Note that you probably want to give your ddict() subclass an empty __slots__ tuple; you don't need the extra __dict__ attribute mapping on your instances, since you are diverting attributes to key-value pairs instead. That saves you a nice chunk of memory per instance.

Demo:

>>> import pickle
>>> class ddict(dict):
...     __slots__ = ()
...     def __getattr__(self, item):
...         try:
...             return self[item]
...         except KeyError:
...             raise AttributeError(item)
...     def __setattr__(self, key, value):
...         self[key] = value
...
>>> pickle.dumps(ddict())
b'\x80\x03c__main__\nddict\nq\x00)\x81q\x01.'
>>> type(pickle.loads(pickle.dumps(ddict())))
<class '__main__.ddict'>
>>> d = ddict()
>>> d.foo = 'bar'
>>> d.foo
'bar'
>>> pickle.loads(pickle.dumps(d))
{'foo': 'bar'}

That pickle tests for the __getstate__ method on the instance rather than on the class as is the norm for special methods, is a discussion for another day.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
-3

First of all, I think you may need to distinguish between instance attribute and class attribute. In Python official document Chapter 11.1.4 about pickling, it says:

instances of such classes whose dict or the result of calling getstate() is picklable (see section The pickle protocol for details).

Therefore, the error message you're getting is when you try to pickle an instance of the class, but not the class itself - in fact, your class definition will just pickle fine.

Now for pickling an object of your class, the problem is that you need to call the parent class's serialization implementation first to properly set things up. The correct code is:

In [1]: import pickle

In [2]: class ddict(dict):
   ...:
   ...:     def __getattr__(self, item):
   ...:         super.__getattr__(self, item)
   ...:         return self[item]
   ...:
   ...:     def __setattr__(self, key, value):
   ...:         super.__setattr__(self, key, value)
   ...:         self[key] = value
   ...:

In [3]: d = ddict()

In [4]: d.name = "Sam"

In [5]: d
Out[5]: {'name': 'Sam'}
In [6]: pickle.dumps(d)
Out[6]: b'\x80\x03c__main__\nddict\nq\x00)\x81q\x01X\x04\x00\x00\x00nameq\x02X\x03\x00\x00\x00Samq\x03s}q\x04h\x02h\x03sb.'
peidaqi
  • 673
  • 1
  • 7
  • 18
  • 2
    Sorry, but everything in this answer is wrong. Pickle tries to access the `__getstate__` method on the instance, at which point the `__getattr__` method is consulted. This tries to access a key by that name in the mapping and that fails with a `KeyError`; that's not a normal exception for attribute access breaking pickle. – Martijn Pieters Feb 16 '17 at 14:04
  • Next, there is no `__getattr__` method on `dict` so the `super()` call will fail. The method is optional, only used as a fallback hook for missing attributes. – Martijn Pieters Feb 16 '17 at 14:05
  • Next, the whole point of intercepting `__setattr__` is to override the normal attribute assignment process and redirect to dictionary key assignment instead. Calling the `super()` version defeats that purpose. – Martijn Pieters Feb 16 '17 at 14:07
  • Last but not least, `super()` must be *called*, you example omits the calls. Plus there are a few other basic errors in the code, like indentation and trying to call an empty attribute on `__getattr__`. Please do make sure you post at least syntactically correct examples when answering. – Martijn Pieters Feb 16 '17 at 14:10
  • I'm new to the stackoverflow text editor and made quite a few mistakes like the indentation, for that I do apologize. However, the code itself, well of course in it formatting-correct form, should work - I am not someone imprudent to post untested code. I've updated my answer with the correct code and outputs - tested on python 3.5.2. – peidaqi Feb 17 '17 at 10:59
  • Yes, it now works, but you now set both a key **and** an attribute. You doubled the storage, as you now have two mappings, the dictionary object itself, and the instance `__dict__`. – Martijn Pieters Feb 17 '17 at 12:09
  • Also, it only works because `super` type object happens to have a `__getattr__` method too. That method is *still the wrong one to use*, and it is pure coincidence that it'll happily accept a `ddict` instance as the first argument at all. Next, that method happens to throw an `AttributeError` for missing attributes too, so it's nice that you delegated that task to that method, but that doesn't make it the right place to do so. And why not just return the `super.__getattr__` result while you are calling it anyway? – Martijn Pieters Feb 17 '17 at 12:13
  • Just to be explicit about this: **don't use unbound methods from the `super` type**. It is absolutely wrong to use those in any context other than a subclass of that type (at which point you should use `super()`, so an instance of that type, *anyway*). – Martijn Pieters Feb 17 '17 at 12:15