Questions tagged [apache-spark-xml]
81 questions
0
votes
1 answer
Reading XML File Through Dataframe
I have XML file like below format.
89:19:00.01
1.9.5.67.2
AB-CD-EF
I built a dataframe on it…

Kumar G
- 55
- 2
- 8
0
votes
1 answer
Spark: How to transform to Data Frame data from multiple nested XML files with attributes
How to transform values below from multiple XML files to spark data frame :
attribute Id0 from Level_0
Date/Value from Level_4
Required output:
+----------------+-------------+---------+
|Id0 |Date |Value …

Dan
- 437
- 7
- 24
0
votes
1 answer
Is it possible to store 2 different struct types in the same column of a data bricks delta table?
I'm receiving multiple XML files that need to be loaded into one table. Those XML files have different struct types for a particular column. I'm wondering if somehow this column could be stored in the same column of a data bricks table. please refer…
0
votes
1 answer
Read files And Modify filename from the azure storage containers in Azure Databricks
I am ingesting Large XML file and generating individual JSON according to the XML Element, I am using SPARK-XML in azure databricks.
Code to create the json file as
commercialInfo
.write
.mode(SaveMode.Overwrite)
.json("/mnt/processed/" +…

Supriyo Bhattacherjee
- 547
- 1
- 5
- 26
0
votes
1 answer
How to create an XML string from dataframe using scala
I have a scenario where I am reading from my hive table and creating a spark dataframe. I want to generate an xml string from the output of dataframe and save it in a new dataframe (as xml string) , rather than writing it to a file in HDFS to create…

Ashma
- 13
- 4
0
votes
0 answers
Google cloud notebook - Pyspark: java.lang.ClassNotFoundException: Failed to find data source: xml
I need to use com.databricks.spark.xml from a google cloud notebook
tried:
import os
#os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0 pyspark-shell'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages…

zbeedatm
- 601
- 1
- 15
- 39
0
votes
1 answer
Spark XML does not seem to work with XML Entities (such as &myentity;)
I am using Spark XML to parse a large document that contains a few user-defined entities. This is a simple snippet from the file
1000000
ヽ
…

fedmest
- 709
- 5
- 17
0
votes
1 answer
Spark-xml crashes on reading processing instructions
I'm attempting to read in an XML file to a Spark dataframe using the Databricks spark-xml package. However, when it comes across processing instructions Spark raises an error claiming an unexpected event.
I'm attempting to import the XML files into…

Gideon Moore
- 21
- 4
0
votes
0 answers
how to create nested tags using spark xml
I am trying to write a df as a xml file. My problem is how can I perform group by in the xml output.
my input data as a csv file:
user_id|acnt_id|transaction|desc
1 |1234| 012345| desc1
1 |1234| 102345| desc2
1 |5678| 123454|…

darkmatter
- 125
- 1
- 2
- 10
0
votes
0 answers
spark read xml database record as xml inputStream instead load from file path
From spark document
load(): DataFrame
load(path: String): DataFrame
load(paths: String*): DataFrame
I defined function which read a xml record
def ExtractData(RecID: String,table:String)={
val spark = SparkSession.
…

tree em
- 20,379
- 30
- 92
- 130
0
votes
1 answer
Reading value xml tag value using spark xml , want to get the value but give me the list

tree em
- 20,379
- 30
- 92
- 130
0
votes
1 answer
Spark JavaRdd / DataFrame / DataSet to XML
I want to convert spark JavaRdd/ Dataframe / Dataset to xml. I have analyzed spark-xml from DataBrics this repo was last released in Nov 2016 (0.4.1 version) and i doubt its compatiblity with new version of DSE and Spark.
IS there any alternative of…

Punith Raj
- 2,164
- 3
- 27
- 45
0
votes
1 answer
XML parsing using spark
I have a table in hive with two columns id(int) and xml_column(string). xml_column is actually a xml but it is stored as string.
+------+--------------------+
| id | xml_column |
+------+--------------------+
| 6723 |

xhang
- 33
- 5
0
votes
2 answers
NotNull condition is not working for withColumn condition in spark data frame scala
So I am trying to add column when I find it but I so not want to add when column is not present in the xml schema .
This is what I am doing I guess I am doing something wrong in checking the condition .
val temp = tempNew1
…

Atharv Thakur
- 671
- 3
- 21
- 39
0
votes
1 answer
cannot resolve explode due to type mismatch in spark while parsing xml file
I have a data frame with below schema
root
|-- DataPartition: long (nullable = true)
|-- TimeStamp: string (nullable = true)
|-- _organizationId: long (nullable = true)
|-- _segmentId: long (nullable = true)
|-- seg:BusinessSegments: struct…

Atharv Thakur
- 671
- 3
- 21
- 39