Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
1
vote
1 answer

How to read avro file using pyspark

I am trying to read avro file in jupyter notebook but facing this issue. Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.AvroFileFormat.DefaultSource and I can't seem to figure out where how to get this dependency…
user1298426
  • 3,467
  • 15
  • 50
  • 96
1
vote
0 answers

Force schema using spark write

I have an encrypted data in avro format which has the following schema {"type":"record","name":"ProtectionWrapper","namespace":"com.security","fields": [{"name":"protectionInfo","type":["null",{"type":"record","name":"ProtectionInfo","fields":…
1
vote
1 answer

How do I read avro file as a list of objects in Java Spark

I have an avro file which i want to read and operate on after converting it to its representative object I've tried loading it using RDD and DataSet in Java Spark but in both cases i'm unable to convert to the required object As…
DanMatlin
  • 1,212
  • 7
  • 19
  • 37
1
vote
0 answers

java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.()V

AvroPlanCompleteTrigger is avro schema generated pojo java class. Code works when we run on local. Avro Version: 1.9.1, spark core 2.4.0, spark streaming 2_11 = 2.4.0 Can someone please help? Exception in thread "streaming-job-executor-0"…
amitwdh
  • 661
  • 2
  • 9
  • 19
1
vote
2 answers

Pyspark 2.4.3, Read Avro format message from Kafka - Pyspark Structured streaming

I am trying to read Avro messages from Kafka, using PySpark 2.4.3. Based on the below stack over flow link , Am able to covert into Avro format (to_avro) and code is working as expected. but from_avro is not working and getting below issue.Are there…
1
vote
2 answers

How to read Avro Binary(Base64) Encoded data in Spark Scala

I am trying to read avro file which is encoded in Binary(Base64) and snappy compressed Hadoop cat on the avro file looks like: Objavro.schema? {"type":"record","name":"ConnectDefault","namespace":"xyz.connect.avro","fields":…
Vicky
  • 11
  • 3
1
vote
1 answer

Avro schema for record type with empty object

I am trying to create avro schema for below json { "id": "TEST", "status": "status", "timestamp": "2019-01-01T00:00:22-03:00", "comment": "add comments or replace it with adSummary data", "error": { "code": "ER1212132", …
merla
  • 489
  • 1
  • 5
  • 12
1
vote
0 answers

Spark Avro record namespace generation for nested structures

I'd like to write Avro records with Spark 2.2.0 where the schema has a namespace and some nested records inside. { "type": "record", "name": "userInfo", "namespace": "my.example", "fields": [ { "name":…
1
vote
1 answer

Produce Avro topic to Kafka using Apache Spark

I have installed kafka locally (no cluster/schema registry for now) and trying to produce an Avro topic and below is the schema associated with that topic. { "type" : "record", "name" : "Customer", "namespace" : "com.example.Customer", "doc"…
1
vote
1 answer

Avro write java.sql.Timestamp conversion error

I need to write a timestamp to Kafka partition and then read it from it. I have defined an Avro schema for that: { "namespace":"sample", "type":"record", "name":"TestData", "fields":[ {"name": "update_database_time", "type": "long",…
Cassie
  • 2,941
  • 8
  • 44
  • 92
1
vote
0 answers

Avro data is not converted Spark

I have written one of the Spark data frame columns into Kafka in Avro format. Then I try to read the data from this topic and convert from Avro to the data frame column. The type of the data is a timestamp and instead of the timestamps from the…
Cassie
  • 2,941
  • 8
  • 44
  • 92
1
vote
0 answers

Serialize avro into Kafka using schema registry and spark

I want to serialize an Avro data into Kafka using Schema Registry, Spark SQL, Kafka and Avro. I tried to utilize to_avro method that accepts only the column parameter. I want to utilize the schema registry to write into Kafka an Avro data. Schema…
mham
  • 145
  • 4
  • 18
1
vote
1 answer

spark 2.4 com.databricks.spark.avro trouble-shooting

I have a spark-job, that I usually submit to a hadoop cluster from a local machine. When I submit it with spark 2.2.0 it works fine, but fails to start when i submit it with version 2.4.0. Just the the SPARK_HOME makes the difference. drwxr-xr-x 18…
Antalagor
  • 428
  • 4
  • 10
1
vote
1 answer

Spark 2.4.0 to_avro / from_avro deserialization not working with Seq().toDF()

I'm testing Spark 2.4.0 new from_avro and to_avro functions. I create a dataframe with just one column and three rows, serialize it with avro, and deserialize it back from avro. If the input dataset is created as val input1 = Seq("foo", "bar",…
redsk
  • 261
  • 6
  • 11
1
vote
1 answer

How to read decimal logical type into spark dataframe

I have an Avro file containing a decimal logicalType as follow: "type":["null",{"type":"bytes","logicalType":"decimal","precision":19,"scale":2}] when I try to read the file with scala spark library the df schema is MyField: binary (nullable =…
Mauro Midolo
  • 1,841
  • 3
  • 14
  • 34