5

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.

Is it possible to partially or fully recover the data? If so, how?

Here's what I've tried:

>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
EOFError: Ran out of input
>>> 

The file is not empty:

>>> os.stat(filename).st_size
31110059

Note: all data in the dictionary was comprised of python built-in types.

eqzx
  • 5,323
  • 4
  • 37
  • 54

2 Answers2

8

The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:

import io
import pickle

# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler

import dill

if __name__ == '__main__':
    obj = [1, 2, {3: 4, "5": ('6',)}]
    data = dill.dumps(obj)

    handle = io.BytesIO(data[:-5])  # cut it off

    unpickler = dill.Unpickler(handle)

    try:
        unpickler.load()
    except EOFError:
        pass

    print(unpickler.stack)

I get the following output:

[3, 4, '5', ('6',)]

The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 6
    I'm the `dill` author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. `pickle` and thus `dill` pickles recursively, so be warned that the "last object" means "the last object that was the target of a `dump`". – Mike McKerns Mar 12 '18 at 12:56
  • 1
    With the newest version of dill it failed with: AttributeError: 'Unpickler' object has no attribute 'stack'. How can this be achieved in new dill versions? which version did you use? – Gilad Jan 07 '22 at 08:55
0

I can't comment on the above answer, but to extend Blender's answer:

unpickler.metastack worked for me, dill v0.3.5.1 (though you could do it without dill, afaik). stack did exist, but was an empty list.

Also, with dill I got a UnpicklingError rather than EOFError. This could also be partly because of how my file got corrupted (ran out of disk space)

Steveo
  • 31
  • 1
  • Thanks for the tip about metastack. I wrote some example code (without dill) https://stackoverflow.com/a/76686521/360265 – mpa Jul 17 '23 at 11:11