1

I'm trying to load an XML file in to dataframe using PySpark in databricks notebook.

df = spark.read.format("xml").options(
    rowTag="product" , mode="PERMISSIVE", columnNameOfCorruptRecord="error_record"
).load(filePath)

On doing so, I get following error:

Could not initialize class com.databricks.spark.xml.util.PermissiveMode$

Databricks runtime version : 7.3 LTS Spark version : 3.0.1 Scala version : 2.12

The same code block runs perfectly fine in DBR 6.4 Spark 2.4.5, Scala 2.11

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Aman Sehgal
  • 546
  • 4
  • 13

1 Answers1

2

You need to upgrade version of spark_xml library to a version compiled for Scala 2.12, because the version that works for DBR 6.4 isn't compatible with new Scala version. So, instead of spark-xml_2.11 you need to use spark-xml_2.12.

P.S. I just checked with DBR 7.3 LTS & com.databricks:spark-xml_2.12:0.11.0 - works just fine.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132