0

I am getting None statistics (min / max) when reading file from S3 using fastparquet. When calling

fp.ParquetFile(fn=path, open_with=myopen).statistics['min']

Most of the values are None, and some of the values are valid.

However, when I read the same file with other framework, I am able to get the correct min/max for all values.

How can I get all the statistics? Thanks

LeonBam
  • 145
  • 1
  • 12

1 Answers1

1

The full set of row groups are available as the list

pf = fp.ParquetFile(fn=path, open_with=myopen)
pf.row_groups

and each row group has a .columns attribute, which in turn have meta_data; so you can dig around to see what the individual min/max of the columns are.

mdurant
  • 27,272
  • 5
  • 45
  • 74