1
sampleDf = spark.createDataFrame([(1, 'A', 2021, 1, 5),(1, 'B', 2021, 1, 6),(1, 'C', 2021, 1, 7),],['msg_id', 'msg', 'year', 'month', 'day'])
sampleDf.show()
sampleDf.write.format("parquet").option("path", "/mnt/datalake/test").mode("append").partitionBy("msg_id","year", "month", "day").saveAsTable("test_rawais") 

This results in the following table:
enter image description here

Now when I convert this parquet table to a delta table, I get the following error. enter image description here

If I directly create a new delta table instead of a 'convert to delta', then it works. enter image description here

Any inputs would help. Thanks.

mck
  • 40,932
  • 13
  • 35
  • 50
SriramN
  • 432
  • 5
  • 19
  • Are you working from a clean slate when adding the records to the table? From your screenshot it looks like more data is present than what you defined in your first query. Just remember that a `DROP TABLE` will not delete the data in your case, since you've created an external table. – Bram Jan 07 '21 at 11:49
  • @Bram, yes its a clean slate. Hope y're referring to the second image with the error msg. If you see the same set of 3 files are repeated 4 times in the error. In the physical path there are 3 parquet files in total. Thanks. – SriramN Jan 07 '21 at 13:21

0 Answers0