Questions tagged [apache-spark-xml]
81 questions
1
vote
0 answers
How to sort the dates using spark while generating xml?
I am trying to write an xml file by converting dataframes using some jax jar, but I need to sort. It's not sorting if I appply my sorting at dataframe-level because at the end I am writing a final dataframe using xml jar and call each object how can…

Db8
- 93
- 1
- 2
- 8
1
vote
0 answers
Dataset Filter working in an unexpected way
Scenario:
I have read two XML files via specifying a schema on load.
In the schema, One of the tags is mandatory. One XML is missing that mandatory tag.
Now, when I do the following, I am expecting the XML with the missing mandatory tag to be…

Amar Wadhwani
- 67
- 1
- 10
1
vote
1 answer
how to update nested column's value of xml in spark scala dataframe
Suppose I have following xml data:
110
2
abc
def
…

seeker
- 98
- 9
1
vote
1 answer
How to identify or reroute bad xml's when reading xmls with spark
Using spark, I am trying to read a bunch of xmls from a path, one of the files is a dummy file which is not an xml.
I would like the spark to tell me that one particular file is not valid, in any way
Adding "badRecordsPath" otiton writes the bad…

Geethanadh
- 313
- 5
- 17
1
vote
1 answer
Can we create a xml file with specific node with Spark Scala?
I have another question about Spark and Scala. I want to use that technologie to get data and generate a xml.
Therefore, I want to know if it is possible to create node ourself (not automatic creation) and what library can we use ? I search but I…

THIBAULT Nicolas
- 159
- 3
- 11
1
vote
1 answer
Spark-xml Roottag and rowtag not reading the xml properly
I am working on an xml that has the structure like below.
I am trying to access tag 2.1.1 and its child attributes. So, I have given root tag as tag2 and rowtag as tag 2.1.1. The below code is returning null. If I apply the same logic to tag1, it is…

sakthi srinivas
- 182
- 1
- 4
- 12
1
vote
1 answer
Select Fields that start with a certain pattern: Spark XML Parsing
I am having to parse some very large xml files. There are a few fields within those xml files that I want to extract and then perform some work on them. However, there are some rules that I need to follow, i.e. I can only select fields if they…

fletchr
- 646
- 2
- 8
- 25
1
vote
1 answer
Use recursive globbing to extract XML documents as strings in pyspark
The goal is to extract XML documents, given an XPath expression, from a group of text files as strings. The difficulty is the variance of forms the text files may be in. Might be:
single zip / tar file with 100 files, each 1 XML document
one…

ghukill
- 1,136
- 17
- 42
1
vote
0 answers
Spark-Xml Root Tag is Generated in every part file
So I am trying to generate a XML which is of below structure.
234
34
234
34
Now I…

Punith Raj
- 2,164
- 3
- 27
- 45
1
vote
0 answers
Spark-Xml: Array within an Array in Dataframe to generate XML
I have a requirement to generate a XML which has a below structure
parent
child1
child1
…

Punith Raj
- 2,164
- 3
- 27
- 45
1
vote
1 answer
Checking null condition before adding new column in spark job scala
I have a below schema
root
|-- DataPartition: long (nullable = true)
|-- TimeStamp: string (nullable = true)
|-- _action: string (nullable = true)
|-- env:Data: struct (nullable = true)
| |-- _type: string (nullable = true)
| |--…

Anupam
- 284
- 5
- 21
1
vote
1 answer
Custom schema with nested parent node in spark-xml
I am pretty new to spark-xml and I am finding it difficult to prepare a custom schema for my Object. Request you all to help me. Below is what I have tried.
I am using Spark 1.4.7 and spark-xml version 0.3.5
Test.Java
StructType customSchema = new…

Punith Raj
- 2,164
- 3
- 27
- 45
1
vote
0 answers
spark structured streaming for XML files
I am trying to parse an xml files using spark xml databricks package(spark-xml_2.11 of com.databricks) using structred streaming (spark.readStream--).
While performing readstream operation, it is saying like unsupported operation…

ashok kumar
- 11
- 1
1
vote
1 answer
Write records per partition in spark data frame to a xml file
I have to do the records count in a file per partition in spark data frame and then I have to write output to XML file.
Here is my data frame.
dfMainOutputFinalWithoutNull.coalesce(1).write.partitionBy("DataPartition","StatementTypeCode")
…

Sudarshan kumar
- 1,503
- 4
- 36
- 83
0
votes
1 answer
Importing Manually Declared Nested Schema from Package Causes NullPointerException
I'm trying to parse XML files into DataFrames using Databricks' spark-xml with this line of code:
val xmlDF = spark
.read
.option("rowTag", "MeterReadingDocument")
.option("valueTag", "foo") // meaningless, used to parse tags with no…

user371816
- 7
- 2