When reading a file of size 4 MB with pandas.read_pickle()
, EOFError: Ran out of input
is thrown. This file has been written with pandas.to_pickle()
but due to a software bug, the thread running pandas.to_pickle()
might have been killed. Is there a way to retrieve some data from this file?
Asked
Active
Viewed 45 times
0

mpa
- 68
- 6
1 Answers
0
I found a hint in this Stackoverflow question. The code below is an example how I recovered all relevant data for our case. The structure of the code and the amount of recoverable data obviously depends on the corrupted file. Good luck :-)
with open(path_to_file.pkl, "rb") as f:
corrupted_data = io.BytesIO(f.read())
# Use the pure-Python version, we can't see the internal state of the C version
unpickler = pickle._Unpickler(corrupted_data)
try:
unpickler.load()
except EOFError:
pass
metastack = unpickler.metastack
mgr = metastack[1]
bool_columns: np.ndarray = mgr[2].values
num_rows = bool_columns.shape[1]
int_columns: np.ndarray = mgr[3].values
object_columns: np.ndarray = metastack[2]
value_list: list[np.ndarray] = object_columns[4]
print(f"{num_rows=}", bool_columns.shape, int_columns.shape)
object_column1: list[np.ndarray] = value_list[:num_rows]
object_column2: list[np.ndarray] = value_list[num_rows:2 * num_rows]
object_column3: list[np.ndarray] = value_list[2 * num_rows:]

mpa
- 68
- 6