pandas.read_parquet returns 'IsADirectoryError' in Azure databricks notebook

Asked Nov 22 '19 at 18:33

Active Nov 22 '19 at 18:33

Viewed 278 times

When I execute pd.read_parquet("/dbfs/XX/XX/agg.parquet") to access a parquet file called agg in databricks' dbfs, it returns 'IsADirectoryError'. Although the file is shown as a folder when I use dbutils to list it, I think Spark can just read it as a file so it works fine with spark reading. The pandas read_parquet module seems using fastparquet backend, so may fastparquet be the reason why it raises an error?

asked Nov 22 '19 at 18:33

zzzk

1

I am just experiencing the same problem. If I use the PyArrow engine, I get 'ArrowIOError: Invalid parquet file. Corrupt footer.' error. – bugfoot Nov 29 '19 at 12:38

pandas.read_parquet returns 'IsADirectoryError' in Azure databricks notebook

0 Answers0