Questions tagged [apache-spark-xml]
81 questions
0
votes
0 answers
spark-xml library is parsing xml file manytimes
I use spark-xml library from databricks for parsing xml file (550 MB).
Dataset books= spark.sqlContext().read()
.format("com.databricks.spark.xml")
.option("rootTag", "books")
.option("rowTag", "book")
…

robynico
- 157
- 1
- 12
0
votes
1 answer
Add extra column for child data frame from parent data frame in nested XML in Spark
I am creating a data after loading many XML files .
Each xml file has one unique field fun:DataPartitionId
I am creating many rows from one XML files .
Now I want to add this fun:DataPartitionId for each row in the resulting rows from the XML.
For…

Atharv Thakur
- 671
- 3
- 21
- 39
0
votes
1 answer
How to save array data frame output from spark xml in csv format
I have deleted two of my question because i thought i was too big and i could not explained it neatly .
So i am trying to make it simple this time .
So i have an complex nested xml .
I am parsing it in spark scala and i have to save all the data…

Sudarshan kumar
- 1,503
- 4
- 36
- 83
0
votes
1 answer
Spark DataFrame xml change column name
I was trying to load XML files using DataBricks Spark XML.
I am able to load the data properly, but I need to change the name of one of the column and put it as a separate tag inside the schema. Basically, there are few tags which need to be…

Deepan Ram
- 842
- 1
- 10
- 25
0
votes
3 answers
Read XML File in Spark with multiple RowTags
I would like to read a huge XML File with 3 different RowTags into Apache Spark Dataframes.
RowTag = The XML Element, which you interpret as a row in Spark.
The Tags
contain different data Structures
are not overlapping
xml-spark…

JanDE
- 11
- 1
- 2
-1
votes
1 answer
Adding part of the parent Schema column to child in nested json in spark data frame
I have below xml that i am trying to load in to spark data frame.
urn:uuid:6d2af93bfbfc49da9805aebb6a38996d
…

Sudarshan kumar
- 1,503
- 4
- 36
- 83