Questions tagged [apache-spark-xml]

81 questions

votes

1 answer

spark-xml problem with encoding windows-1251

I have a problem with parsing an XML document in pyspark using spark-xml API (pyspark 2.4.0). I have a file with cyryllic content with the following opening tag: So when I try to open it with some text…

asked Aug 09 '23 at 08:58

Владислав Черкасов

votes

1 answer

Load XML file to dataframe in PySpark using 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

from pyspark.sql.column import Column, _to_java_column from pyspark.sql.types import _parse_datatype_json_string def ext_from_xml(xml_column, schema, options={}): java_column = _to_java_column(xml_column.cast('string')) java_schema =…

apache-spark pyspark databricks azure-databricks apache-spark-xml

asked Mar 10 '23 at 08:28

Ujjal

votes

0 answers

How to split xml file into multiple xml files based on tag

I have one large XML file that looks like the following. I would like to split this large XML file into multiple XML files/chunks based on tag. I would like each XML file to have 1000 PRVDR. What is the best way to do this in pyspark? So,…

xml apache-spark pyspark apache-spark-xml

asked Feb 21 '23 at 04:31

AJR

votes

1 answer

pyspark: org.xml.sax.SAXParseException Current config of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000

I am trying to parse xml files with XSD using spark-xml library in pyspark. Below is the code : xml_df = spark.read.format("com.databricks.spark.xml") \ .option("rootTag", "Document") \ .option("rowTag", "row01") \ …

azure-databricks apache-spark-xml

asked Feb 06 '23 at 17:43

newbee123

votes

1 answer

spark-xml: Crashing out of memory trying to parse single large XML file

I'm attempting to process bz2 compressed XML files with a nested XML schema into normalized tables where each level of the schema is stored as a row, and any child elements are stored as rows in a separate table with a foreign key relating back to…

apache-spark-xml

asked Jan 22 '23 at 23:46

Rimer

2,054
6
28
43

votes

1 answer

XML Parsing with Spark-XML

I have a XML like this:

xml apache-spark pyspark apache-spark-xml

asked Dec 02 '22 at 16:10

Ryan

votes

2 answers

How to install spark-xml library using dbx

I am trying to install library spark-xml_2.12-0.15.0 using dbx. The documentation I found is to include it on the conf/deployment.yml file like: custom: basic-cluster-props: &basic-cluster-props spark_version: "10.4.x-cpu-ml-scala2.12" …

databricks apache-spark-xml databricks-dbx

asked Sep 12 '22 at 07:34

jalazbe

1,801
3
19
40

votes

1 answer

Spark xpath function to return null if no value present for an attribute

I am using spark xpath to get the attribute values from an xml string. The xpath returns an array of values from the xml tag. If there are multiple rows present in a tag with one of the rows having an attribute with null, the xpath function is…

xml xpath pyspark databricks apache-spark-xml

asked Aug 30 '22 at 19:00

shaz_nwaz

votes

1 answer

How To Read XML File from Azure Data Lake In Synapse Notebook without Using Spark

I have an XML file stored in Azure Data Lake which I need to read from Synapse notebook. But when I read this using spark-xml library, I get this error: org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema:…

scala apache-spark azure-synapse apache-spark-xml

asked Apr 20 '22 at 10:42

oneDerer

votes

1 answer

Getting empty dataframe on parsing XML with XSD using spark-xml package

I am trying to parse simple XML by supplying XSD schema. Using the approach given here. https://github.com/databricks/spark-xml#xsd-support XML is here: aa bb cc dd XSD is…

xml apache-spark xsd apache-spark-xml

asked Jan 25 '22 at 09:54

Keds

votes

1 answer

Migrating Apache Spark xml from 2.11 to 2.12 gives the below warning.How to use The xmlReader directly

Code: val xmlDf: DataFrame = spark.read .format("xml") .option("nullValue", "") .xml(df.select("payload").map(x => x.getString(0))) warning: method xml in class XmlDataFrameReader is deprecated (since 0.13.0): Use XmlReader…

apache-spark apache-spark-sql databricks azure-databricks apache-spark-xml

asked Oct 07 '21 at 06:12

Vikram Pawar

votes

0 answers

Explode simple XML file in pyspark (Not using databricks)

I have a XML file which is given below : Cake 0.55 Regular …

python xml apache-spark pyspark apache-spark-xml

asked Aug 24 '21 at 07:51

SIDDHARTHA SUMAN

votes

0 answers

How to write Spark XML writing with order by?

am trying to write xml file from my dataframe like below myDf.orderBy("name") .repartition(1).write .format("com.databricks.spark.xml) .option("rootTag","colname") .option("rowTag","colname2") .save("filename") This is writing a file but not…

xml scala apache-spark sql-order-by apache-spark-xml

asked Jul 30 '21 at 14:26

Karthik

votes

0 answers

Unable to insert the parsed xml data into delta tables in spark with a changing input schema

I am trying to insert data from a dataframe into a delta table. Initially, I am parsing an xml file based on a target schema and saving the result into a dataframe. Below is the code used for parsing. def parseAsset (nodeSeqXml: scala.xml.NodeSeq) :…

xml scala apache-spark apache-spark-sql apache-spark-xml

asked May 17 '21 at 18:30

SanjanaSanju

votes

1 answer

Exploding multiple array columns in spark for a changing input schema

xml scala apache-spark apache-spark-sql apache-spark-xml

asked May 14 '21 at 18:08

SanjanaSanju

Prev 1 2 3

5 6 Next