Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
0
votes
1 answer

Failed to load avro package in R

I have avro files in my local drive to read and i want these to be analysed through R. However this package is not getting installed. It is not available in cran so i had to download via GitHub. Here is the…
0
votes
1 answer

Iceberg is not working when writing AVRO from spark

We are encountering the following error when appending AVRO files from GCS to table. The avro files are valid but we use deflated avro, is that a concern? Exception in thread "streaming-job-executor-0" java.lang.NoClassDefFoundError:…
0
votes
1 answer

Apache Hudi example from spark-shell throws error for Spark 2.3.0

I am trying to run this example (https://hudi.apache.org/docs/quick-start-guide.html) using spark-shell. The Apache Hudi documentation says "Hudi works with Spark-2.x versions" The environment details are: Platform: HDP 2.6.5.0-292 Spark version:…
Joyan
  • 41
  • 1
  • 7
0
votes
1 answer

How to encode structs into Avro record in Spark?

I'm trying to use to_avro() function to create Avro records. However, I'm not able to encode multiple columns, as some columns are simply lost after encoding. A simple example to recreate the problem: val schema = StructType(List( …
Gorionovic
  • 185
  • 2
  • 9
0
votes
1 answer

Spark can not process recursive avro data

I have avsc schema like below: { "name": "address", "type": [ "null", { "type":"record", "name":"Address", "namespace":"com.data", "fields":[ { …
0
votes
1 answer

Conditional loading of partitions from file-system

I am aware that there have been questions regarding wildcards in pySparks .load()-function like here or here. Anyhow, none of the questions/answers I found dealt with my variation of it. Context In pySpark I want to load files directly from HDFS…
Markus
  • 2,265
  • 5
  • 28
  • 54
0
votes
2 answers

How do I access the data in a Avro.snz file with C#

I have an Avro.snz file whose avro.codecs is snappy This can be opened with com.databricks.avro in Spark but it seems snappy is unsupported by Apache.Avro and Confluent.Avro, they only have deflate and null. Although they can get me the Schema, I…
Ranald Fong
  • 401
  • 3
  • 12
0
votes
1 answer

Not in union ["null","int"] Avro Format org.apache.avro.UnresolvedUnionException

I have a java program which writes data from Oracle db in avro format. I am getting this exception on a date column while writing org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union…
ankit
  • 1
  • 1
  • 2
0
votes
1 answer

Write from Spark to Kafka in avro format using defined schema?

I have a dataframe that I need to write to Kafka. I have the avro schema defined, similar to this: { "namespace": "my.name.space", "type": "record", "name": "MyClass", "fields": [ {"name": "id", "type": "string"}, …
Mahmoud Hanafy
  • 1,861
  • 3
  • 24
  • 33
0
votes
1 answer

Apache Beam AvroIO read large file OOM

Problem: I am writing an Apache Beam pipeline to convert Avro file to Parquet file (with Spark runner). Everything works well until I start to convert large size Avro file (15G). The code used to read Avro file to create PColletion: …
fuyi
  • 2,573
  • 4
  • 23
  • 46
0
votes
1 answer

how to force avro writer to write timestamp in UTC in spark scala dataframe

I need to write Timestamp field to avro and ensure the data is saved in UTC. currently avro converts it to long (timestamp millis ) in the Local timezone of the server which is causing issues as if the server reading bk is a different timezone. I…
Ajith Kannan
  • 812
  • 1
  • 8
  • 30
0
votes
1 answer

Writing dataframe to kafka topic in an avro format for spark < 2.4?

Q1. Considering I have a dataframe df and a schema myschema, how do I proceed to write the dataframe into kafka topic in an avro format ? Q2. Is there any optimized way if we do not consider udf ? Most of the available solutions are for spark > 2.4…
supernatural
  • 1,107
  • 11
  • 34
0
votes
1 answer

Spark not reading all the records from binary file

I am trying to read Avro files from S3 and as shown in this spark documentation I am able to read it fine. My files are like below, these files consist of 5000 record each.…
Explorer
  • 1,491
  • 4
  • 26
  • 67
0
votes
1 answer

Generate schema less avro using Spark

Is there a way to generate schema less avro from Apache spark? I can see a way to generate it through Java/Scala using apache avro library and through confluent avro. When I write Avro from Spark in below way, it creates Avro's with schema. I want…
Explorer
  • 1,491
  • 4
  • 26
  • 67
0
votes
1 answer

Copying avro jars into docker jars directory

I'm learning spark I'd like to use an avro data file as avro is external to spark. I've downloaded the jar. But my problem is how to copy it into that specific place 'jars dir' into my container? I've read relative post here but I do not…
abdoulsn
  • 842
  • 2
  • 16
  • 32