2

Problem

>>> import pandas as pd
>>> df = pd.read_sql("SELECT * FROM table_name", con)
>>> df.to_feather("some_file.feather")
>>> pd.read_feather("some_file.feather")

ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -55259020800000000

Does someone has a solution for this issue?

>>> pd.__version__
1.2.1

Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33

1 Answers1

0

The issue was likely due to limitations in the dates that pandas can handle.

Here's a way to reproduce the issue:

import pandas as pd
from datetime import datetime

df = pd.DataFrame(
    {
        "date": [
            datetime.fromisoformat("1654-01-01"),
            datetime.fromisoformat("1920-01-01"),

        ],
        "Answer": [42,42],
        "Question": ["the answer to the ultimate question of life?","Dan Kaminsky age of death?"],
    }
)
df.to_feather("to_trash.feather")
pd.read_feather("to_trash.feather")

Indeed pandas can't handle dates that are before year 1677, and after year 2262. I guess that when pandas try to read the feather file, it tries to cast the array containing datetime objects as a datetime64[ns] column, which fails because it can't handle old dates and far in the future dates.

How to handle the error to access the feather file.

Solution 1: Select all the columns, except the ones that contains incompatible dates.

df = pd.read_feahter("to_trash.feather", columns=["Answer", "Question"])

Solution 2: Use directly the pyarrow library

import pyarrow.feather as feather
df = feather.read_table("to_trash.feather")
df = pd.DataFrame(df.to_pylist())

There is probably other solutions that are more memory efficient.

New solution

Thanks to this commit You can now use timestamp_as_object as an argument.

import pandas as pd
from datetime import datetime
df = pd.DataFrame({"date": [
datetime.fromisoformat("1654-01-01"),
datetime.fromisoformat("1920-01-01"),
],})
df.to_feather("to_trash.feather")

### read feather file      
from pyarrow import feather
df = feather.read_feather("to_trash.feather", timestamp_as_object=True)
Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33