I have a parquet file I am reading from s3 using fastparquet/pandas , the parquet file has a column with date 2022-10-06 00:00:00 , and I see it is wrapping it as 1970-01-20 06:30:14.400, Please see code and error and screen shot of parquet file below .I am not sure why this is happening ? 2022-09-01 00:00:00 seems to be fine. if I choose "pyarrow" as the engine, it fails with exception
pyarrow error:
pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: 101999952000000000
Please advise.
fastparquet error:
OverflowError: value too large
Exception ignored in: 'fastparquet.cencoding.time_shift'
OverflowError: value too large
OverflowError: value too large
code:
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket="blah", Key="blah1")
df=pd.read_parquet(io.BytesIO(obj['Body'].read()),engine="fastparquet")