0

I am calling an api which sends back an xml string as its response. I am trying to take that xml string and save it as an xml file in ADLS using pyspark in Azure Synapse Notebooks. From there I am then trying to read that xml file and convert it to parquet.

I was able to successfully call the api address and get the xml string as a response, however when trying to write out the file or read an xml file using the below logic I am met with the following error.

df = spark.read.format("com.databricks.spark.xml").options(rowTag="message").load("<adls_file_path>")

error - Py4JJavaError: An error occurred while calling o1744.load. : java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml

CMc
  • 1
  • 1

1 Answers1

0

The above error mainly happens because of libraries are properly not installed.

Please follow below steps:

enter image description here

enter image description here

enter image description here

Download a jar file click here

Follow below reference for more information:

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11