Highest Voted 'spark-avro' Questions

2

votes

1 answer

Missing Avro Custom Header when using Spark SQL Streaming

Before sending an Avro GenericRecord to Kafka, a Header is inserted like so. ProducerRecord record = new ProducerRecord<>(topicName, key, message); record.headers().add("schema", schema); Consuming the record. When using Spark…

asked Jun 08 '20 at 21:22

Dale Angus

43
6

2

votes

2 answers

FileNotFoundException: Spark save fails. Cannot clear cache from Dataset[T] avro

I get the following error when saving a dataframe in avro for a second time. If I delete sub_folder/part-00000-XXX-c000.avro after saving, and then try to save the same dataset, I get the following: FileNotFoundException: File…

apache-spark caching apache-spark-sql apache-spark-dataset spark-avro

asked May 11 '20 at 08:55

Rod

347
1
3
11

2

votes

1 answer

org.apache.avro.AvroTypeException: Expected record-start. Got VALUE_STRING

I am doing simple json to Avro Record conversion, But I am getting this issue, I tried lot of ways, I applied more than 15 solutions from stackoverflow and online. My File look like this { "namespace": "test", "type": "record", "name":…

java apache-spark avro spark-avro

asked Mar 28 '20 at 09:04

Sun

3,444
7
53
83

2

votes

1 answer

DataFrameReader throwing "Unsupported type NULL" while reading avro file

I am trying to read an avro file with DataFrame, but keep getting: org.apache.spark.sql.avro.IncompatibleSchemaException: Unsupported type NULL Since I am going to deploy it on Dataproc I am using Spark 2.4.0, but the same happened when I tried…

apache-spark apache-spark-sql google-cloud-dataproc spark-avro

asked Oct 10 '19 at 09:33

ohaionm

31
2

2

votes

0 answers

How to make streaming query with select over avro struct faster?

I'm using Spark structured streaming with Kafka streaming source and Avro format and the creation of dataframe is very slow! In order to measure the streaming query I have to add an action in order to evaluate the DAG and calculate the time. If I…

scala apache-spark pyspark spark-structured-streaming spark-avro

asked Aug 12 '19 at 08:02

ggeop

1,230
12
24

2

votes

0 answers

Default schema value conversion fails in to_avro() while publishing data to Kafka using databricks spark-avro

Trying to publish data into Kafka topic using confluent schema registry. Following is my schema registry schemaRegistryClient.register("primitive_type_str_avsc", new Schema.Parser().parse( s""" |{ | "type": "record", | "name":…

databricks confluent-schema-registry spark-avro

asked Aug 08 '19 at 11:42

cristen

21
3

2

votes

1 answer

Hive External table on AVRO file producing only NULL data for all columns

I am trying to create an Hive external table on top of some avro files which are generated using spark-scala. I am using CDH 5.16 which has hive 1.1, spark 1.6. I created hive external table, which ran successfully. But when i query the data i am…

hadoop hive avro spark-avro hive-table

asked Jul 17 '19 at 07:02

Vaishak

607
3
8
30

2

votes

0 answers

How to include external avro packages for Spark 2.4 into Junit?

I could have asked how can I avoid Avro is built-in but external data source module since Spark 2.4 I have been using the following approach to bootstrap my session in junit (this approach works for all my my other tests). sparkSession =…

apache-spark junit avro spark-avro

asked Jul 16 '19 at 20:20

hba

7,406
10
63
105

2

votes

0 answers

Block size invalid or too large - Failed to read Avro files

I'm using spark and scala , and trying to read avro folders using com.databricks - spark-avro_2.11. All the folders were read successfully, except for one folder, which failed with the following exception. (attached) I checked the files manually,…

apache-spark avro spark-avro

asked Jul 09 '19 at 15:20

Ben Haim Shani

265
4
15

2

votes

1 answer

How to write a pyspark-dataframe to redshift?

I am trying to write a pyspark DataFrame to Redshift but it results into error:- java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated Caused…

pyspark spark-avro spark-redshift

asked May 04 '19 at 10:06

murtaza1983

247
2
8

2

votes

1 answer

How can I set a logicalType in a spark-avro 2.4 schema?

We read timestamp information from avro files in our application. I am in the process of testing an upgrade from Spark 2.3.1 to Spark 2.4 which includes the newly built-in spark-avro integration. However, I cannot figure out how to tell the avro…

scala apache-spark avro spark-avro

asked Feb 06 '19 at 18:14

Matt Ford

46
7

2

votes

3 answers

Spark reading Avro file

I'm using com.databricks.spark.avro. When I run it from spark-shell like so: spark-shell --jar spark-avro_2.11-4.0.0.jar, I am able to read the file by doing this: import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) val…

scala apache-spark avro spark-avro

asked Dec 10 '18 at 23:40

covfefe

2,485
8
47
77

2

votes

1 answer

Converting StructType to Avro Schema, returns type as Union when using databricks spark-avro

I am using databricks spark-avro to convert a dataframe schema into avro schema.The returned avro schema fails to have a default value. This is causing issues when i am trying to create a Generic record out of the schema. Can, any one help with the…

apache-spark-sql schema avro databricks spark-avro

asked Dec 04 '18 at 12:12

Yanamadala JaiPrakash

53
1
6

2

votes

2 answers

Schema in Avro message

I see that the Avro messages have the schema embedded, and then the data in binary format. If multiple messages are sent and new avro files are getting created for every message, is not Schema embedding an overhead? So, does that mean, it is always…

apache avro spark-avro avro-tools avro4s

asked Jul 22 '18 at 19:44

Roshan Fernando

493
11
31

2

votes

0 answers

Read/access primitive double array from parquet using Spark using Java api

I have a Parquet file generated using the parquet-avro library, where one of the field has primitive double array, created using the following schema type: Schema.createArray(Schema.create(Schema.Type.DOUBLE)) I read this parquet data from Spark…

scala apache-spark apache-spark-sql parquet spark-avro

asked Jul 16 '18 at 03:45

jcools

71
4

Questions tagged [spark-avro]