Questions tagged [apache-spark-xml]

81 questions
0
votes
1 answer

Reading XML File Through Dataframe

I have XML file like below format. 89:19:00.01 1.9.5.67.2 AB-CD-EF I built a dataframe on it…
0
votes
1 answer

Spark: How to transform to Data Frame data from multiple nested XML files with attributes

How to transform values below from multiple XML files to spark data frame : attribute Id0 from Level_0 Date/Value from Level_4 Required output: +----------------+-------------+---------+ |Id0 |Date |Value …
Dan
  • 437
  • 7
  • 24
0
votes
1 answer

Is it possible to store 2 different struct types in the same column of a data bricks delta table?

I'm receiving multiple XML files that need to be loaded into one table. Those XML files have different struct types for a particular column. I'm wondering if somehow this column could be stored in the same column of a data bricks table. please refer…
0
votes
1 answer

Read files And Modify filename from the azure storage containers in Azure Databricks

I am ingesting Large XML file and generating individual JSON according to the XML Element, I am using SPARK-XML in azure databricks. Code to create the json file as commercialInfo .write .mode(SaveMode.Overwrite) .json("/mnt/processed/" +…
0
votes
1 answer

How to create an XML string from dataframe using scala

I have a scenario where I am reading from my hive table and creating a spark dataframe. I want to generate an xml string from the output of dataframe and save it in a new dataframe (as xml string) , rather than writing it to a file in HDFS to create…
Ashma
  • 13
  • 4
0
votes
0 answers

Google cloud notebook - Pyspark: java.lang.ClassNotFoundException: Failed to find data source: xml

I need to use com.databricks.spark.xml from a google cloud notebook tried: import os #os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0 pyspark-shell' os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages…
zbeedatm
  • 601
  • 1
  • 15
  • 39
0
votes
1 answer

Spark XML does not seem to work with XML Entities (such as &myentity;)

I am using Spark XML to parse a large document that contains a few user-defined entities. This is a simple snippet from the file 1000000
0
votes
1 answer

Spark-xml crashes on reading processing instructions

I'm attempting to read in an XML file to a Spark dataframe using the Databricks spark-xml package. However, when it comes across processing instructions Spark raises an error claiming an unexpected event. I'm attempting to import the XML files into…
0
votes
0 answers

how to create nested tags using spark xml

I am trying to write a df as a xml file. My problem is how can I perform group by in the xml output. my input data as a csv file: user_id|acnt_id|transaction|desc 1 |1234| 012345| desc1 1 |1234| 102345| desc2 1 |5678| 123454|…
darkmatter
  • 125
  • 1
  • 2
  • 10
0
votes
0 answers

spark read xml database record as xml inputStream instead load from file path

From spark document load(): DataFrame load(path: String): DataFrame load(paths: String*): DataFrame I defined function which read a xml record def ExtractData(RecID: String,table:String)={ val spark = SparkSession. …
tree em
  • 20,379
  • 30
  • 92
  • 130
0
votes
1 answer

Reading value xml tag value using spark xml , want to get the value but give me the list

KH0013001 -2271164.00 9 65395 1 KHR TR 6-71-10-1-001-030 1
tree em
  • 20,379
  • 30
  • 92
  • 130
0
votes
1 answer

Spark JavaRdd / DataFrame / DataSet to XML

I want to convert spark JavaRdd/ Dataframe / Dataset to xml. I have analyzed spark-xml from DataBrics this repo was last released in Nov 2016 (0.4.1 version) and i doubt its compatiblity with new version of DSE and Spark. IS there any alternative of…
0
votes
1 answer

XML parsing using spark

I have a table in hive with two columns id(int) and xml_column(string). xml_column is actually a xml but it is stored as string. +------+--------------------+ | id | xml_column | +------+--------------------+ | 6723 |
xhang
  • 33
  • 5
0
votes
2 answers

NotNull condition is not working for withColumn condition in spark data frame scala

So I am trying to add column when I find it but I so not want to add when column is not present in the xml schema . This is what I am doing I guess I am doing something wrong in checking the condition . val temp = tempNew1 …
Atharv Thakur
  • 671
  • 3
  • 21
  • 39
0
votes
1 answer

cannot resolve explode due to type mismatch in spark while parsing xml file

I have a data frame with below schema root |-- DataPartition: long (nullable = true) |-- TimeStamp: string (nullable = true) |-- _organizationId: long (nullable = true) |-- _segmentId: long (nullable = true) |-- seg:BusinessSegments: struct…
Atharv Thakur
  • 671
  • 3
  • 21
  • 39