I am trying to parse simple XML by supplying XSD schema. Using the approach given here.
https://github.com/databricks/spark-xml#xsd-support
XML is here:
<?xml version="1.0"?>
<beginnersbook>
<to>My Readers</to>
<from>Chaitanya</from>
<subject>A Message to my readers</subject>
<message>Welcome to beginnersbook.com</message>
</beginnersbook>
XSD is here:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="https://www.beginnersbook.com"
xmlns="https://www.beginnersbook.com"
elementFormDefault="qualified">
<xs:element name="beginnersbook">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="subject" type="xs:string"/>
<xs:element name="message" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
I am trying to read this XSD and trying to build schema like below.
import com.databricks.spark.xml.util.XSDToSchema
import java.nio.file.Paths
val schemaParsed = XSDToSchema.read(Paths.get("<local_linux_path>/sample_file.xsd"))
print(schema)
Here schema successfully parsed. Next I am reading XML file like below.
val df = spark.read.format("com.databricks.spark.xml").schema(schemaParsed).load("<hdfs_path>/sample_file.xml")
After this step I can display schema of Dataframe using df.printSchema() , But content is coming as empty if I am giving df.show()
Please guide me where I am doing wrong here.
Thanks in advance.