-2

I have a rather simple dataclass.

I saved it on a pickle (using dill instead of the real pickle).

import dill as pickle

After some other operations:

  • Loading the same pickle fails
  • Trying to save the same object fails

Error:

TypeError: cannot pickle '_hashlib.HASH' object

I am not using any hashlib library (that I am aware of).

Previously I was able to pickle/unpickle the same object/dataclass without issues.

Note: The reason of putting the Q/A here is because that error message was leading me to very obscure places, far away from my real problem/scenario. I don't want others to think there is something wrong with the dataclass or pickle/dill when it is not the case.

Rub
  • 2,071
  • 21
  • 37

2 Answers2

0

It is rather silly, but difficult to figure out.

The problem is that I reloaded the module with the definition of the object (the dataclass). After that, pickle/dill doesn't work as expected (as expected by the naive mind that doesn't understand well how pickle works).

As mentioned here, reloading is for development and has some side-effects.

If you need to use pickle and reload, rather restart the kernel and start again.

[updated on 2022-12-23 as per comment from dill's author]

If you really need a way to pickle by value instead of by reference, try `cloudpickle` (haven't used it).

An important difference between cloudpickle and pickle is that cloudpickle can serialize a function or class by value, whereas pickle can only serialize it by reference.

https://github.com/cloudpipe/cloudpickle

Rub
  • 2,071
  • 21
  • 37
  • I'm the `dill` author. I don't understand what is going on from your question and/or answer. I can say, however, that your statement about pickling by value is inaccurate and incomplete. `pickle`, `cloudpickle`, and `dill` all pickle some objects by value and some by reference. For example, `pickle` pickles classes by reference, while `cloudpickle` pickles by value, and `dill` lets you choose how you want to dump and load the class. It sounds like for your particular case, `dill` has the most flexible solution... but I don't have enough specifics from you question/answer to show that – Mike McKerns Dec 22 '22 at 12:55
  • Thanks Mike. You are right, `dill` is so far what I need. About the "statement", I just copied it from the link underneath the statement, but I am happy to remove it if you find it wrong. – Rub Dec 23 '22 at 14:37
  • No worries. If you do add more details to your question, feel free to tag me (or reply here) so I see it. – Mike McKerns Dec 23 '22 at 18:21
0

It sounds like the class definition of whatever dataclass you are using has changed to include a hash. Unfortunately, you can't pickle a hash, as a _hashlib.HASH object is a C object that doesn't provide pickling instructions (i.e. it can't be serialized). Here's an example, using dill with different serialization settings (to mimic pickle and cloudpickle):

>>> import hashlib
>>> hash = hashlib.md5()
>>> import dill
>>> dill.dumps(hash)  # default: global by-value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 263, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 235, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 394, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle _hashlib.HASH objects
>>>
>>> # change to "cloudpickle" settings (pointer-trace by-value)
>>> dill.settings['recurse'] = True
>>> dill.dumps(hash)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 263, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 235, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 394, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle _hashlib.HASH objects
>>>
>>> # change to "pickle" settings (by reference)
>>> dill.settings['byref'] = True
>>> dill.dumps(hash)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 263, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 235, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 394, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/Users/mmckerns/lib/python3.7/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle _hashlib.HASH objects

Depending on where a hash is used in the dataclass object, one of the above variants may succeed.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139