1

I was pickling some variables and the process was interrupted. How do I recover data from the partially pickled file?

Pickled two lists of lists using:

import pickle
import sys

sys.setrecursionlimit(5000) # to get around max depth recursion error

with open('level4_half.pkl', 'wb') as f:
    pickle.dump([level4_url, level4_desc], f)

Checked from windows explorer that the file is not empty (158MB)

Tried to unpickle file using:

with open('level4_half.pkl','rb') as f: 
    level4_url, level4_desc = pickle.load(f)

and encountered the error:

Traceback (most recent call last):

  File "<ipython-input-18-32ed3a0e79d4>", line 2, in <module>
    level4_url, level4_desc = pickle.load(f)

EOFError

(Previously I've tried to pickle and unpickle (fully pickled) files successfully using the commands above.)

I found a similar question here but I didn't use dill and am not sure if a partially pickled file is considered corrupted. My current technical skill is not so honed as to be able to quickly implement the solution there: "Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information." If this turns out to be the same solution to my question, it will be great to get guidance on how I can "hook all of the load_ methods".

Thank you.

Sun
  • 11
  • 2
  • Unless there's an existing pickle recovery tool out there somewhere (I couldn't find any such thing in a brief search), it would likely be faster to regenerate/recreate the data than to manually recover the file (which would still require you to recreate the portions of the data that were lost). My approach would be to make a copy of Python's unpickling code, and modify it so that it catches the EOFError and returns whatever data it had read so far. This is likely to require multiple iterations, as the error may be nested arbitrarily deep in the process. – jasonharper May 21 '19 at 15:27
  • Thank you, in the end I did as you said which was to regenerate the file. – Sun Jun 02 '19 at 08:29
  • Can you detail more completely the method you used to regenerate the file? – McM Jul 22 '22 at 22:19

0 Answers0