0

I have a dataframe say df, which has 900 columns. when I do df.to_parquet() it gives key error. You may go through the error below.Just to tell you all "df.to_pickle" works fine in my code

As a work around when I do: "df.astype(str);" which converts all column into string and therefore "df.to_parquet" becomes successful

But but... I do not want to convert all columns into 'str'. I just want to know what all columns specifically are causing issue. So that's my major concern. Hope you all understand my question now.


KeyError
Traceback (most recent call last)
~/.conda/envs/py3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in get_logical_type(arrow_type)
     68     try:
---> 69         return logical_type_map[arrow_type.id]
     70     except KeyError:

NotImplementedError: struct<>
LoneWanderer
  • 3,058
  • 1
  • 23
  • 41
  • Hello! Just a tip: to properly format the code in the question, you can select it and press `{}` button on the bar on top of the textarea. – Valentino Oct 19 '19 at 13:42
  • As for locating which row / column raises an error, see here: https://stackoverflow.com/questions/26660313/pandas-location-of-a-row-with-error – Valentino Oct 19 '19 at 13:46

2 Answers2

0

Look at: https://github.com/pandas-dev/pandas/issues/21228

For more specific answer, you need to provide more details about your dataframe(columns types and minimal reproducible example).

Quant Christo
  • 1,275
  • 9
  • 23
0

Try using df.select_dtypes(include='object')

More info here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html

Then, run .astype(str) on each of the columns selected. That way, you aren't converting other columns into strings.

Hadas Arik
  • 58
  • 4
  • code patch available at link works fine for me. But how to get the columns which are really creating problem. code patch there just converts all objects type str. I don't think we are having problem with all columns which are objects type. – Ritwick Pandey Oct 19 '19 at 16:38