Highest Voted 'apache-spark-xml' Questions

1

vote

0 answers

How to sort the dates using spark while generating xml?

I am trying to write an xml file by converting dataframes using some jax jar, but I need to sort. It's not sorting if I appply my sorting at dataframe-level because at the end I am writing a final dataframe using xml jar and call each object how can…

asked Feb 05 '20 at 15:02

Db8

93
1
2
8

1

vote

0 answers

Dataset Filter working in an unexpected way

Scenario: I have read two XML files via specifying a schema on load. In the schema, One of the tags is mandatory. One XML is missing that mandatory tag. Now, when I do the following, I am expecting the XML with the missing mandatory tag to be…

java xml apache-spark apache-spark-dataset apache-spark-xml

asked Dec 11 '19 at 10:47

Amar Wadhwani

67
1
10

1

vote

1 answer

how to update nested column's value of xml in spark scala dataframe

Suppose I have following xml data: 110 2

abc def

…

xml scala apache-spark apache-spark-sql apache-spark-xml

asked Oct 27 '19 at 17:24

seeker

98
9

1

vote

1 answer

How to identify or reroute bad xml's when reading xmls with spark

Using spark, I am trying to read a bunch of xmls from a path, one of the files is a dummy file which is not an xml. I would like the spark to tell me that one particular file is not valid, in any way Adding "badRecordsPath" otiton writes the bad…

apache-spark pyspark apache-spark-xml

asked Jun 13 '19 at 19:06

Geethanadh

313
5
17

1

vote

1 answer

Can we create a xml file with specific node with Spark Scala?

I have another question about Spark and Scala. I want to use that technologie to get data and generate a xml. Therefore, I want to know if it is possible to create node ourself (not automatic creation) and what library can we use ? I search but I…

scala apache-spark apache-spark-xml

asked Jan 16 '19 at 18:17

THIBAULT Nicolas

159
3
11

1

vote

1 answer

Spark-xml Roottag and rowtag not reading the xml properly

I am working on an xml that has the structure like below. I am trying to access tag 2.1.1 and its child attributes. So, I have given root tag as tag2 and rowtag as tag 2.1.1. The below code is returning null. If I apply the same logic to tag1, it is…

apache-spark pyspark apache-spark-xml

asked Dec 11 '18 at 07:57

sakthi srinivas

182
1
4
12

1

vote

1 answer

Select Fields that start with a certain pattern: Spark XML Parsing

I am having to parse some very large xml files. There are a few fields within those xml files that I want to extract and then perform some work on them. However, there are some rules that I need to follow, i.e. I can only select fields if they…

scala apache-spark apache-spark-sql apache-spark-xml

asked Jun 12 '18 at 23:14

fletchr

646
2
8
25

1

vote

1 answer

Use recursive globbing to extract XML documents as strings in pyspark

The goal is to extract XML documents, given an XPath expression, from a group of text files as strings. The difficulty is the variance of forms the text files may be in. Might be: single zip / tar file with 100 files, each 1 XML document one…

apache-spark pyspark databricks apache-spark-xml

asked May 18 '18 at 14:32

ghukill

1,136
17
42

1

vote

0 answers

Spark-Xml Root Tag is Generated in every part file

So I am trying to generate a XML which is of below structure. 234 34 234 34 Now I…

apache-spark apache-spark-sql spark-csv apache-spark-xml

asked Apr 26 '18 at 12:00

Punith Raj

2,164
3
27
45

1

vote

0 answers

Spark-Xml: Array within an Array in Dataframe to generate XML

I have a requirement to generate a XML which has a below structure parent child1 child1 …

apache-spark apache-spark-sql apache-spark-dataset apache-spark-xml

asked Apr 24 '18 at 17:40

Punith Raj

2,164
3
27
45

1

vote

1 answer

Checking null condition before adding new column in spark job scala

scala apache-spark apache-spark-xml

asked Apr 10 '18 at 03:12

Anupam

284
5
21

1

vote

1 answer

Custom schema with nested parent node in spark-xml

I am pretty new to spark-xml and I am finding it difficult to prepare a custom schema for my Object. Request you all to help me. Below is what I have tried. I am using Spark 1.4.7 and spark-xml version 0.3.5 Test.Java StructType customSchema = new…

apache-spark apache-spark-sql apache-spark-dataset apache-spark-xml

asked Mar 27 '18 at 18:24

Punith Raj

2,164
3
27
45

1

vote

0 answers

spark structured streaming for XML files

I am trying to parse an xml files using spark xml databricks package(spark-xml_2.11 of com.databricks) using structred streaming (spark.readStream--). While performing readstream operation, it is saying like unsupported operation…

apache-spark-xml

asked Nov 26 '17 at 01:55

ashok kumar

11
1

1

vote

1 answer

Write records per partition in spark data frame to a xml file

I have to do the records count in a file per partition in spark data frame and then I have to write output to XML file. Here is my data frame. dfMainOutputFinalWithoutNull.coalesce(1).write.partitionBy("DataPartition","StatementTypeCode") …

scala apache-spark-sql apache-zeppelin spark-csv apache-spark-xml

asked Oct 10 '17 at 10:16

Sudarshan kumar

1,503
4
36
83

0

votes

1 answer

Importing Manually Declared Nested Schema from Package Causes NullPointerException

I'm trying to parse XML files into DataFrames using Databricks' spark-xml with this line of code: val xmlDF = spark .read .option("rowTag", "MeterReadingDocument") .option("valueTag", "foo") // meaningless, used to parse tags with no…

scala apache-spark xml-parsing apache-spark-xml

asked Aug 22 '23 at 07:58

user371816

7
2

Questions tagged [apache-spark-xml]