Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
0
votes
1 answer

How to write Avro Objects to Parquet with partitions in Java ? How to append data to the same parquet?

I am using Confluent's KafkaAvroDerserializer to deserialize Avro Objects sent over Kafka. I want to write the recieved data to a Parquet file. I want to be able to append data to the same parquet and to create a Parquet with Partitions. I managed…
0
votes
1 answer

Port data from HDFS/S3 to local FS and load in Java

I have a Spark job running on an EMr cluster that writes out a DataFrame to HDFS (which is then s3-dist-cp-ed to S3). The data size isn't big (2 GB when saved as parquet). These data in S3 are then copied to a local filesystem (EC2 instance running…
Nik
  • 5,515
  • 14
  • 49
  • 75
0
votes
1 answer

How to read Avro Schema-typed Events from kafka and store them in a Hive Table

My idea is to use Spark Streaming + Kafka to get the events from the kafka bus. After retrieving a batch of avro-encoded events I would like to transform them with Spark Avro into SparkSQL Dataframes and then write the dataframes to a Hive Table. Is…
0
votes
0 answers

zero bytes avro file exception

I am currently using avro 1.8.2 to write log events. I am observing certain very rare cases where my DataFileWriter is actually writing out 0 bytes file. As far as i understand a valid avro file should always have header. The code snippet looks like…
0
votes
1 answer

Issue while loading an avro dataset into Teradata with spark-streaming

I am trying to load a dataset of avro files into a Teradata table through spark streaming (jdbc). The configuration is properly set and the load succeeds to certain extent (I can validate rows of data have been inserted into the table), but halfways…
0
votes
0 answers

Bigquery load from Avro gives can not convert from long to int

I am trying to load the avro file from google storage to Big query tables but faced these issue. Steps i have followed are as below. Create a dataframe in spark. Stored these data by writing it into avro. dataframe.write.avro("path") Loaded these…
whoisthis
  • 33
  • 8
0
votes
0 answers

error: not found: value SchemaConverters

I am using databricks for my use-case where I have to convert avro schema to struct type. When I searched, it says spark-avro has SchemaConverters to do that. However, I am using spark-avro-2.11-4.0 library and when I use SchemaConverters, I get…
NNN
  • 11
  • 1
  • 3
0
votes
1 answer

Spark Avro Throws : Caused by: java.lang.IllegalArgumentException: object is not an instance of declaring class

I am trying to create a dataframe and write the result in avro format. This is giving the IllegalArgumentException exception as mentioned in the subject. It is working correctly if I am saving it as text file but failing while writing avro. Using…
Tirthankar
  • 75
  • 1
  • 9
0
votes
2 answers

Saving data to ElasticSearch in Spark task

While processing a stream of Avro messages through Kafka and Spark, I am saving the processed data as documents in a ElasticSearch index. Here's the code (simplified): directKafkaStream.foreachRDD(rdd ->{ rdd.foreach(avroRecord -> { …
0
votes
1 answer

Avro schema update with two schema in one avro file

I have one avro file with first schema then I updated the schema that appends to the same file. So now I have two schemas in one file. How does avro handle this scenario. Will I have any new fields add in the file or will I loose any data while…
buckeyeosu
  • 45
  • 8
0
votes
1 answer

How to convert dataframe to avro using schema?

How to convert a dataframe into Avro format using a user-specified schema?
user3699367
0
votes
2 answers

Fail reading avro from S3 using spark in emr

When executing my Spark job at aws-emr I got this error when trying to read avro file from s3 bucket: It happen with versions: emr - 5.5.0 emr - 5.9.0 This is the code: val files = 0 until numOfDaysToFetch map { i => …
0
votes
0 answers

Spark read avro results from a previous write results in "Not an avro data file" due to _SUCCESS file

I'm using the great databricks connector to read/write avro files. I have the following code df.write.mode(SaveMode.Overwrite).avro(someDirectory) Problem is that when I try to read this directory using sqlContext.read.avro(someDirectory) it…
Hagai
  • 275
  • 3
  • 13
0
votes
1 answer

Spark SQL : Handling schema evolution

I want to read 2 avro files of same data set but with schema evolution first avro file schema : {String, String, Int} second avro file schema evolution : {String, String, Long} (Int field is undergone evolution to long) I want to read these two…
jshweta14
  • 23
  • 4
0
votes
1 answer

databricks avro schema cannot be converted to a Spark SQL structtype

we have kakfa hdfs connector writing into hdfs in default avro format. A sample o/p: Obj^A^B^Vavro.schema"["null","string"]^@$ͳø{<9d>¾Ã^X:<8d>uV^K^H5^F°^F^B<8a>^B{"severity":"notice","message":"Test…
user2286963
  • 125
  • 2
  • 11