5

I have files with .snappy.parquet extension that I need to read into my Jupyter notebook, and convert it to pandas dataframe.

import numpy
import pyarrow.parquet as pq

filename = "part-00000-tid-2430471264870034304-5b82f32f-de64-40fb-86c0-fb7df2558985-1598426-1-c000.snappy.parquet" 
df = pq.read_table(filename).to_pandas()

The error is:

ArrowNotImplementedError: lists with structs are not supported

eshirvana
  • 23,227
  • 3
  • 22
  • 38
Chique_Code
  • 1,422
  • 3
  • 23
  • 49

1 Answers1

4

As of 2019-11-30, columns which are of type List[Struct[..]] (i.e. mixed nesting of lists and structs) are not supported by Apache Arrow. As mentioned in a different answer, the related issue is https://issues.apache.org/jira/browse/ARROW-1644.

To still read this file, you can read in all columns that are of supported types by supplying the columns argument to pyarrow.parquet.read_table. To find out which columns have the complex nested types, look at the schema of the file using pyarrow.parquet.ParquetFile(filename).schema.

Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42