When I execute pd.read_parquet("/dbfs/XX/XX/agg.parquet") to access a parquet file called agg in databricks' dbfs, it returns 'IsADirectoryError'. Although the file is shown as a folder when I use dbutils to list it, I think Spark can just read it as a file so it works fine with spark reading. The pandas read_parquet module seems using fastparquet backend, so may fastparquet be the reason why it raises an error?
Asked
Active
Viewed 278 times
2
-
1I am just experiencing the same problem. If I use the PyArrow engine, I get 'ArrowIOError: Invalid parquet file. Corrupt footer.' error. – bugfoot Nov 29 '19 at 12:38