Databricks ConvertToDelta - Parquet table to Delta - "AssertionError: assertion failed: File name collisions found"

Asked Jan 07 '21 at 09:23

Active Jan 07 '21 at 09:26

Viewed 404 times

sampleDf = spark.createDataFrame([(1, 'A', 2021, 1, 5),(1, 'B', 2021, 1, 6),(1, 'C', 2021, 1, 7),],['msg_id', 'msg', 'year', 'month', 'day'])
sampleDf.show()
sampleDf.write.format("parquet").option("path", "/mnt/datalake/test").mode("append").partitionBy("msg_id","year", "month", "day").saveAsTable("test_rawais")

This results in the following table:

Now when I convert this parquet table to a delta table, I get the following error.

If I directly create a new delta table instead of a 'convert to delta', then it works.

Any inputs would help. Thanks.

edited Jan 07 '21 at 09:26

mck

40,932
13
35
50

asked Jan 07 '21 at 09:23

SriramN

Are you working from a clean slate when adding the records to the table? From your screenshot it looks like more data is present than what you defined in your first query. Just remember that a `DROP TABLE` will not delete the data in your case, since you've created an external table. – Bram Jan 07 '21 at 11:49
@Bram, yes its a clean slate. Hope y're referring to the second image with the error msg. If you see the same set of 3 files are repeated 4 times in the error. In the physical path there are 3 parquet files in total. Thanks. – SriramN Jan 07 '21 at 13:21

Databricks ConvertToDelta - Parquet table to Delta - "AssertionError: assertion failed: File name collisions found"

0 Answers0