1

Supposed that I have this kind of xml structure:

<xml version = "1.0" encoding = "UTF-8"?>
<a>
    <title = "Kurosaki Ichigo"
            tel = "123-456, 234-567"
          class = "Employee"
             id = "EM-02"/>
    <title = "Abarai Renji"
            tel = "345-678, 456-789"
          class = "Employee"
             id = "EM-03"/>
    <title = "Aizen Sosuke"
            tel = "567-890, 012-345"
          class = "Employee"
             id = "EM-04"/>
</a>

I want to let Databricks (Pyspark) read this data. How can I set up the options and other parameters?

1 Answers1

0

For Reading XML file .Please follow below syntax:

Make sure to check and install com.databricks:spark-xml for reading XML file.

Ref1

Sample XML

<catalog>
<book id="bk101" name="vv" address="hyd">
    
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>

Reading XML file

df = spark.read.format("com.databricks.spark.xml").option("rowTag", "book").load("dbfs:/FileStore/xmlvalidator.xml")  
    display(df)

enter image description here

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11