I am trying to parse simple XML by supplying XSD schema. Using the approach given here.
https://github.com/databricks/spark-xml#xsd-support
XML is here:
<note>
<to>aa</to>
<from>bb</from>
<heading>cc</heading>
<body>dd</body>
</note>
XSD is here:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
I am trying to read this XSD and trying to build schema like below.
import com.databricks.spark.xml.util.XSDToSchema
import java.nio.file.Paths
val schemaParsed = XSDToSchema.read(Paths.get("<local_linux_path>/sample_file.xsd"))
print(schemaParsed)
Here schema successfully parsed. Next I am reading XML file like below.
val df = spark.read.format("com.databricks.spark.xml").schema(schemaParsed).load("<hdfs_path>/sample_file.xml")
After this step I can display schema of Dataframe using df.printSchema() , But content is coming as empty if I am giving df.show()
Please guide me where I am doing wrong here.
Note: This question is exactly same as this: How to parse XML with XSD using spark-xml package?
But reposting same question again as I am not able to comment there. Thanks in advance.