The issue was likely due to limitations in the dates that pandas
can handle.
Here's a way to reproduce the issue:
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{
"date": [
datetime.fromisoformat("1654-01-01"),
datetime.fromisoformat("1920-01-01"),
],
"Answer": [42,42],
"Question": ["the answer to the ultimate question of life?","Dan Kaminsky age of death?"],
}
)
df.to_feather("to_trash.feather")
pd.read_feather("to_trash.feather")
Indeed pandas can't handle dates that are before year 1677, and after year 2262. I guess that when pandas try to read the feather file, it tries to cast the array containing datetime objects as a datetime64[ns]
column, which fails because it can't handle old dates and far in the future dates.
How to handle the error to access the feather file.
Solution 1: Select all the columns, except the ones that contains incompatible dates.
df = pd.read_feahter("to_trash.feather", columns=["Answer", "Question"])
Solution 2: Use directly the pyarrow library
import pyarrow.feather as feather
df = feather.read_table("to_trash.feather")
df = pd.DataFrame(df.to_pylist())
There is probably other solutions that are more memory efficient.
New solution
Thanks to this commit You can now use timestamp_as_object
as an argument.
import pandas as pd
from datetime import datetime
df = pd.DataFrame({"date": [
datetime.fromisoformat("1654-01-01"),
datetime.fromisoformat("1920-01-01"),
],})
df.to_feather("to_trash.feather")
### read feather file
from pyarrow import feather
df = feather.read_feather("to_trash.feather", timestamp_as_object=True)