I am receiving an out of bounds timestamp
error message when attempting to convert a pandas dataframe to a pyarrow Table and write to a parquet dataset. From some researching, it seems to be a a result of pandas using nanosecond precision and pyarrow only being able to interpret down to the millisecond precision, I believe.
import cx_Oracle
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
connection = cx_Oracle.connect(os.getenv('USER'), os.getenv('__OPW'), os.getenv('DB_SERVICE'))
gen = pd.read_sql('SELECT * FROM myschema.mytable where rownum < 10001', con=connection, chunksize=1_000)
for df in gen:
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, root_path='/tmp/dataset', partition_cols=['my_part_col'])
ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: 253402214400000000
When i comment out the last line:
# pq.write_to_dataset(table, root_path='/tmp/dataset', partition_cols=['my_part_col'])
...and re-run, the error message is no longer produced so it may be occurring from the conversion from pyarrow table to parquet.
Is there a known workaround for this?
Thanks.
Update:
Here's the full traceback...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Users/myusername/miniconda3/envs/py38/lib/python3.8/site-packages/pyarrow/parquet.py", line 1754, in write_to_dataset
df = table.to_pandas()
File "pyarrow/array.pxi", line 715, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 1565, in pyarrow.lib.Table._to_pandas
File "/Users/myusername/miniconda3/envs/py38/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 779, in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
File "/Users/myusername/miniconda3/envs/py38/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1114, in _table_to_blocks
result = pa.lib.table_to_blocks(options, block_table, categories,
File "pyarrow/table.pxi", line 1028, in pyarrow.lib.table_to_blocks
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: 253402214400000000