0

I am using PySpark.pandas read_excel function to import data and saving the result in metastore using to_table. It works fine if format='parquet'. However, the job hangs if format='delta'. The cluster idles after creating the parquets and does not proceed to write _delta_log (at least that's what it seems).

Have you any clue what might be happening?

I'm using Databricks 11.3, Spark 3.3.

I have also tried importing Excel using regular pandas, convert the pandas DF to spark DF using spark.createDataFrame, and then write.saveAsTable without success if format is delta.

  • 2
    Perhaps if you can post some code, error message, people might be able to help. Take care not to post any PII or confidential information... – rainingdistros Nov 04 '22 at 08:32

0 Answers0