Highest Voted 'spark-avro' Questions

5

votes

1 answer

How to save complex json or complex objects as Parquet in Spark?

I'm new to Spark and I'm trying to figure out if there is a way to save complex objects (nested) or complex jsons as Parquet in Spark. I'm aware of the Kite SDK, but I understand it uses Map/Reduce. I looked around but I was unable to find a…

asked Apr 13 '17 at 06:29

IceMan

1,398
16
35

5

votes

2 answers

How to write avro to multiple output directory using spark

Hi，There is a topic about writing text data into multiple output directories in one spark job using MultipleTextOutputFormat Write to multiple outputs by key Spark - one Spark job I would ask if there is some similar way to write avro data to…

apache-spark avro spark-avro

asked Nov 25 '16 at 04:59

Tom

5,848
12
44
104

5

votes

1 answer

Skipping fields in a record using spark-avro

Update: spark-avro package was update to support this scenario. https://github.com/databricks/spark-avro/releases/tag/v3.1.0 I have an AVRO file that was created by a third party outside my control, which I need to process using spark. The AVRO…

apache-spark avro spark-avro

asked Nov 03 '16 at 15:29

itaysk

5,852
2
33
40

4

votes

3 answers

Can I write multiple DataFrames in parallel in Spark?

I have question i want to sequentially write many dataframe in avro format and i use the code below in a for loop. df .repartition() .write .mode() .avro() The problem is when i run my spark job…

dataframe scala apache-spark apache-spark-sql spark-avro

asked Aug 18 '22 at 20:33

Benart

61
1
4

4

votes

1 answer

How to call avro SchemaConverters in Pyspark

Although PySpark has Avro support, it does not have the SchemaConverters method. I may be able to use Py4J to accomplish this, but I have never used a Java package within Python. This is the code I am using # Import SparkSession from pyspark.sql…

apache-spark pyspark spark-avro

asked Apr 12 '22 at 20:06

H. Trujillo

427
1
8
21

4

votes

1 answer

java.lang.NoSuchMethodError when reading an avro file using PySpark

I'm trying to load an avro file using PySpark running on Dataproc Job: spark_session.read.format("avro").load("/path/to/avro") I'm getting de flowing error: File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 166, in…

apache-spark pyspark google-cloud-dataproc spark-avro

asked Apr 29 '20 at 20:46

thepabloaguilar

53
1
8

4

votes

2 answers

Spark DataFrame: How to specify schema when writing as Avro

I want to write a DataFrame in Avro format using a provided Avro schema rather than Spark's auto-generated schema. How can I tell Spark to use my custom schema on write?

apache-spark apache-spark-sql spark-avro

asked Feb 21 '18 at 00:35

erwaman

3,307
3
28
29

4

votes

2 answers

Map Avro files on Java class with different field names

I've got a problem with simple spark task, which reads Avro file and then save it as Hive parquet table. I've got 2 types of file, in general they are the same, but the key struct is a little different - field names. Type 1 root |-- pk: strucnt…

java apache-spark apache-spark-sql spark-avro

asked Jan 29 '18 at 07:42

Danila Zharenkov

1,720
1
15
27

4

votes

0 answers

Streaming avro files from a directory

I'm trying to set up a structured stream from a directory of Avro files. We already have some non-streaming code to deal with exact the same data, so the least-effort step forward to streaming would be to re-use that code. To move to…

apache-spark avro spark-structured-streaming spark-avro

asked Dec 06 '17 at 11:07

Bertjan Broeksema

1,541
17
28

4

votes

2 answers

Handling schema changes in running Spark Streaming application

I am looking to build a Spark Streaming application using the DataFrames API on Spark 1.6. Before I get too far down the rabbit hole, I was hoping someone could help me understand how DataFrames deals with data having a different schema. The idea…

apache-spark spark-streaming avro spark-avro

asked Dec 16 '16 at 23:18

Ben

1,793
2
15
22

4

votes

2 answers

Installing spark-avro

I'm trying to read avro files in pyspark. Found out from How to read Avro file in PySpark that spark-avro is the best way to do that but I can't figure out how to install that from their Github repo. There's no downloadable jar, do I build it…

pyspark spark-avro

asked Nov 17 '16 at 06:12

noobman

75
1
7

3

votes

1 answer

Why is adding org.apache.spark.avro dependency is mandatory to read/write avro files in Spark2.4 while I'm using com.databricks.spark.avro?

I tried to run my Spark/Scala code 2.3.0 on a Cloud Dataproc cluster 1.4 where there's Spark 2.4.8 installed. I faced an error concerning the reading of avro files. Here's my code…

scala apache-spark google-cloud-dataproc spark-avro

asked Dec 17 '21 at 15:00

jonas lahwf

53
5

3

votes

1 answer

How to write spark dataframe in a single file in local system without using coalesce

I want to generate an avro file from a pyspark dataframe and currently I am doing coalesce as below df = df.coalesce(1) df.write.format('avro').save('file:///mypath') But this is leading to memory issues now as all the data will be fetched to…

apache-spark pyspark spark-avro

asked Aug 25 '20 at 08:32

newbie

1,282
3
20
43

3

votes

1 answer

Schema Evolution Comparison Apache Avro Vs Apache Parquet

I would like to cross check my understanding about the differences in File Formats like Apache Avro and Apache Parquet in terms of Schema Evolution. Looking at various blogs and SO answers gives me the following understanding. I need to verify if my…

hive avro parquet spark-avro

asked Jun 02 '20 at 18:55

Venkatesan Muniappan

445
3
21

3

votes

1 answer

hive external table on avro timestamp field returning as long

I have avro data which has a single column timestamp column and now i am trying to create external hive table on top of the avro files .Data gets saved in avro as long and i expect the avro logical type to handle the conversion back to timestamp…

hive apache-spark-sql spark-avro avro-tools

asked May 29 '20 at 04:37

Ajith Kannan

812
1
8
30

Questions tagged [spark-avro]