Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
0
votes
0 answers

How to read Avro file in spark-shell in Spark2.4.8?

I want help from several people. We are facing problem while reading avro file in spark2-shell in Spark2.4 Any pointers will be of great help. The cause of the error could not be found. $spark-shell --jars…
jsk
  • 3
  • 3
0
votes
0 answers

Unable to deploy Avro Spark. Access denied

I am training to work with spark. Parquet and csv in Jupyter work correctly. When I started trying the avro format, this error appeared. Screen 1 Here they offer a solution. Apache Avro Data Source Guide ./bin/spark-submit --packages…
Decurro
  • 1
  • 2
0
votes
0 answers

Spark-avro Cannot grow BufferHolder because the size is negative - where to look for the cause?

Environment: Scala 2.11 Spark 2.4 Hortownorks SchemaRegistry Kafka messages with embedded schema information. Context As stated above, I am aware of how Hortonworks SchemaRegistry information is embedded in the Kafka message. First 13 bytes of…
0
votes
1 answer

Installing Apache Spark Packages to run Locally

I am looking for a clear guide or steps to installing Spark packages (specifically spark-avro) to run locally and correctly using them with spark-submit command. I've spent a lot of time reading many posts and guides, but still not able to get…
bda
  • 372
  • 1
  • 7
  • 22
0
votes
1 answer

How to Deseralize Avro response getting from Datastream Scala + apache Flink

I am Getting Avro Response from a Kafka Topic from Confluent and i am facing issues when i want to deseralize the response. Not Understanding the Syntax How i should define the Avro deserializer and use in my Kafka Source while reading. Sharing the…
0
votes
1 answer

AVRO file not read fully by Spark

I am reading AVRO file stored on ADLS gen2 using Spark as following: import dbutils as dbutils from pyspark.conf import SparkConf from pyspark.sql import…
RRM
  • 2,495
  • 29
  • 46
0
votes
0 answers

Can not read AVRO data from kafka stream in spark scala app

I have kafka topic with simple avro serialized data in it and I am trying to read this data in my spark app which is on scala. When I print spark Dataframe to console, I can see that there are issues with desterilizing (or smth else) because my…
0
votes
1 answer

How to use spark_read_avro from sparklyr R package?

I'm using: R version 4.1.1 sparklyr version ‘1.7.2’ I'm connected to my databricks cluster with databricks-connect and trying to read an avro file using the following code: library(sparklyr) library(dplyr) sc <- spark_connect( method =…
Anci
  • 11
  • 3
0
votes
1 answer

Spark Batch Avro Deserialization: Malformed data. Length is negative

I am doing some batch processing on Kafka through Spark. The record as serialized as Avro. I am trying to deserialize the value using the exact schema in the message itself but am getting a malformed record exception. Here's my code: …
Prashant Pandey
  • 4,332
  • 3
  • 26
  • 44
0
votes
1 answer

Importing Spark avro packages into a dockerized python project to import avro file in S3

I am trying to read some avro files stored in S3 bucket with the following code. spark version is 2.4.7 from pyspark.sql import SparkSession spark = SparkSession.builder.appName('Statistics').getOrCreate() sc = spark.sparkContext df =…
tharindu
  • 513
  • 6
  • 26
0
votes
0 answers

Fetching avro data from kafka using spark

I tried to publish records from a dataframe built from an avro file while it is built from a CSV file using dataframe. I published the data into a kafka topic in avro format using to_avro(struct(*)) from the dataframe, I was able to view the binary…
0
votes
1 answer

Create AVRO File AWS Glue Dynamic Frame One to Many Join

Is the following behavior possible in AWS Glue? I am trying to create a single AVRO file by joining two DynamicFrames in a one-to-many fasion. For example I have a DyF with many Teacher types: teacher_id teacher_name and a Dyf with many Student…
0
votes
1 answer

Convert dataset to dataframe from an avro file

I wrote a scala script to load an avro file, and to work with the generated data (to retrieve top contributors). The problem is that while loading the file it gives a dataset that i can not convert to dataframe cuz it contains some complex types: …
Issibra
  • 79
  • 1
  • 10
0
votes
1 answer

AvroDeserialisation Failing when deriving a col using sum but is successful when the same column is derived using count.Serialised data is in kafka

Here is my SQL which works : select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as…
kushagra deep
  • 462
  • 6
  • 12
0
votes
1 answer

Avro schema ( .avsc ) enforcement in Pyspark

Can anyone help me with reading a avro schema (.avsc ) through Pyspark and enforcing it while writing the dataframe to a target storage ? All my targetr table schemas are provided as .avsc files and I need to provide this custom schema while saving…
ASHISH M.G
  • 522
  • 2
  • 7
  • 23