Questions tagged [apache-spark-xml]

81 questions
1
vote
0 answers

How to sort the dates using spark while generating xml?

I am trying to write an xml file by converting dataframes using some jax jar, but I need to sort. It's not sorting if I appply my sorting at dataframe-level because at the end I am writing a final dataframe using xml jar and call each object how can…
Db8
  • 93
  • 1
  • 2
  • 8
1
vote
0 answers

Dataset Filter working in an unexpected way

Scenario: I have read two XML files via specifying a schema on load. In the schema, One of the tags is mandatory. One XML is missing that mandatory tag. Now, when I do the following, I am expecting the XML with the missing mandatory tag to be…
1
vote
1 answer

how to update nested column's value of xml in spark scala dataframe

Suppose I have following xml data: 110 2
abc def
1
vote
1 answer

How to identify or reroute bad xml's when reading xmls with spark

Using spark, I am trying to read a bunch of xmls from a path, one of the files is a dummy file which is not an xml. I would like the spark to tell me that one particular file is not valid, in any way Adding "badRecordsPath" otiton writes the bad…
Geethanadh
  • 313
  • 5
  • 17
1
vote
1 answer

Can we create a xml file with specific node with Spark Scala?

I have another question about Spark and Scala. I want to use that technologie to get data and generate a xml. Therefore, I want to know if it is possible to create node ourself (not automatic creation) and what library can we use ? I search but I…
THIBAULT Nicolas
  • 159
  • 3
  • 11
1
vote
1 answer

Spark-xml Roottag and rowtag not reading the xml properly

I am working on an xml that has the structure like below. I am trying to access tag 2.1.1 and its child attributes. So, I have given root tag as tag2 and rowtag as tag 2.1.1. The below code is returning null. If I apply the same logic to tag1, it is…
sakthi srinivas
  • 182
  • 1
  • 4
  • 12
1
vote
1 answer

Select Fields that start with a certain pattern: Spark XML Parsing

I am having to parse some very large xml files. There are a few fields within those xml files that I want to extract and then perform some work on them. However, there are some rules that I need to follow, i.e. I can only select fields if they…
fletchr
  • 646
  • 2
  • 8
  • 25
1
vote
1 answer

Use recursive globbing to extract XML documents as strings in pyspark

The goal is to extract XML documents, given an XPath expression, from a group of text files as strings. The difficulty is the variance of forms the text files may be in. Might be: single zip / tar file with 100 files, each 1 XML document one…
ghukill
  • 1,136
  • 17
  • 42
1
vote
0 answers

Spark-Xml Root Tag is Generated in every part file

So I am trying to generate a XML which is of below structure. 234 34 234 34 Now I…
Punith Raj
  • 2,164
  • 3
  • 27
  • 45
1
vote
0 answers

Spark-Xml: Array within an Array in Dataframe to generate XML

I have a requirement to generate a XML which has a below structure parent child1 child1
1
vote
1 answer

Checking null condition before adding new column in spark job scala

I have a below schema root |-- DataPartition: long (nullable = true) |-- TimeStamp: string (nullable = true) |-- _action: string (nullable = true) |-- env:Data: struct (nullable = true) | |-- _type: string (nullable = true) | |--…
Anupam
  • 284
  • 5
  • 21
1
vote
1 answer

Custom schema with nested parent node in spark-xml

I am pretty new to spark-xml and I am finding it difficult to prepare a custom schema for my Object. Request you all to help me. Below is what I have tried. I am using Spark 1.4.7 and spark-xml version 0.3.5 Test.Java StructType customSchema = new…
1
vote
0 answers

spark structured streaming for XML files

I am trying to parse an xml files using spark xml databricks package(spark-xml_2.11 of com.databricks) using structred streaming (spark.readStream--). While performing readstream operation, it is saying like unsupported operation…
1
vote
1 answer

Write records per partition in spark data frame to a xml file

I have to do the records count in a file per partition in spark data frame and then I have to write output to XML file. Here is my data frame. dfMainOutputFinalWithoutNull.coalesce(1).write.partitionBy("DataPartition","StatementTypeCode") …
0
votes
1 answer

Importing Manually Declared Nested Schema from Package Causes NullPointerException

I'm trying to parse XML files into DataFrames using Databricks' spark-xml with this line of code: val xmlDF = spark .read .option("rowTag", "MeterReadingDocument") .option("valueTag", "foo") // meaningless, used to parse tags with no…