Questions tagged [apache-spark-xml]

81 questions
0
votes
0 answers

spark-xml library is parsing xml file manytimes

I use spark-xml library from databricks for parsing xml file (550 MB). Dataset books= spark.sqlContext().read() .format("com.databricks.spark.xml") .option("rootTag", "books") .option("rowTag", "book") …
0
votes
1 answer

Add extra column for child data frame from parent data frame in nested XML in Spark

I am creating a data after loading many XML files . Each xml file has one unique field fun:DataPartitionId I am creating many rows from one XML files . Now I want to add this fun:DataPartitionId for each row in the resulting rows from the XML. For…
Atharv Thakur
  • 671
  • 3
  • 21
  • 39
0
votes
1 answer

How to save array data frame output from spark xml in csv format

I have deleted two of my question because i thought i was too big and i could not explained it neatly . So i am trying to make it simple this time . So i have an complex nested xml . I am parsing it in spark scala and i have to save all the data…
Sudarshan kumar
  • 1,503
  • 4
  • 36
  • 83
0
votes
1 answer

Spark DataFrame xml change column name

I was trying to load XML files using DataBricks Spark XML. I am able to load the data properly, but I need to change the name of one of the column and put it as a separate tag inside the schema. Basically, there are few tags which need to be…
Deepan Ram
  • 842
  • 1
  • 10
  • 25
0
votes
3 answers

Read XML File in Spark with multiple RowTags

I would like to read a huge XML File with 3 different RowTags into Apache Spark Dataframes. RowTag = The XML Element, which you interpret as a row in Spark. The Tags contain different data Structures are not overlapping xml-spark…
JanDE
  • 11
  • 1
  • 2
-1
votes
1 answer

Adding part of the parent Schema column to child in nested json in spark data frame

I have below xml that i am trying to load in to spark data frame. urn:uuid:6d2af93bfbfc49da9805aebb6a38996d
Sudarshan kumar
  • 1,503
  • 4
  • 36
  • 83
1 2 3 4 5
6