3

I am trying to export a pandas dataframe as a parquet file. This dataframe has a memory usage of 4GB+ with 76 million rows and 6 columns (int64(3) columns, object(3) columns).

When I write this out as a parquet file, I am getting an OverflowError: “Python int too large to convert to C long message.”

I am using fastparquet as follows:

    import fastparquet
    from fastparquet import write
    write('ref_util_table.parq', df)

This works just fine for other dataframes. All the int64 columns are within range (i.e., range from 0-1000).

Any idea on how to fix the issue?

veg2020
  • 956
  • 10
  • 27
  • Are you totally sure that none of your ints are above sys.maxsize? For example, what are the values in`[df[i].max() for i in df.select_dtypes(int64).columns]` – G. Anderson Jan 13 '22 at 23:42
  • 2
    I get an array that looks like this [1, 686, 193] using your code with minor edit; adding quotes to int64 [df[i].max() for i in df.select_dtypes('int64').columns]. By replacing max() with min() in your code, the minimum values are [0,0,0]. – veg2020 Jan 14 '22 at 00:23

0 Answers0