Highest Voted 'spark-avro' Questions

0

votes

1 answer

Failed to load avro package in R

I have avro files in my local drive to read and i want these to be analysed through R. However this package is not getting installed. It is not available in cran so i had to download via GitHub. Here is the…

r shared-libraries avro spark-avro

asked Mar 18 '21 at 04:29

tailchopper

15
4

0

votes

1 answer

Iceberg is not working when writing AVRO from spark

We are encountering the following error when appending AVRO files from GCS to table. The avro files are valid but we use deflated avro, is that a concern? Exception in thread "streaming-job-executor-0" java.lang.NoClassDefFoundError:…

apache-spark google-cloud-storage spark-avro iceberg

asked Jan 28 '21 at 07:03

coderatcloud9

85
1
1
7

0

votes

1 answer

Apache Hudi example from spark-shell throws error for Spark 2.3.0

I am trying to run this example (https://hudi.apache.org/docs/quick-start-guide.html) using spark-shell. The Apache Hudi documentation says "Hudi works with Spark-2.x versions" The environment details are: Platform: HDP 2.6.5.0-292 Spark version:…

apache-spark avro spark-avro spark-shell apache-hudi

asked Dec 27 '20 at 08:36

Joyan

41
1
7

0

votes

1 answer

How to encode structs into Avro record in Spark?

I'm trying to use to_avro() function to create Avro records. However, I'm not able to encode multiple columns, as some columns are simply lost after encoding. A simple example to recreate the problem: val schema = StructType(List( …

apache-spark spark-avro

asked Dec 08 '20 at 12:16

Gorionovic

185
2
9

0

votes

1 answer

Spark can not process recursive avro data

I have avsc schema like below: { "name": "address", "type": [ "null", { "type":"record", "name":"Address", "namespace":"com.data", "fields":[ { …

apache-spark pyspark avro recursive-datastructures spark-avro

asked Nov 14 '20 at 00:04

gorrch

521
3
16

0

votes

1 answer

Conditional loading of partitions from file-system

I am aware that there have been questions regarding wildcards in pySparks .load()-function like here or here. Anyhow, none of the questions/answers I found dealt with my variation of it. Context In pySpark I want to load files directly from HDFS…

apache-spark pyspark avro spark-avro

asked Aug 03 '20 at 17:28

Markus

2,265
5
28
54

0

votes

2 answers

How do I access the data in a Avro.snz file with C#

I have an Avro.snz file whose avro.codecs is snappy This can be opened with com.databricks.avro in Spark but it seems snappy is unsupported by Apache.Avro and Confluent.Avro, they only have deflate and null. Although they can get me the Schema, I…

c# avro snappy spark-avro

asked Jun 24 '20 at 01:06

Ranald Fong

401
3
12

0

votes

1 answer

Not in union ["null","int"] Avro Format org.apache.avro.UnresolvedUnionException

I have a java program which writes data from Oracle db in avro format. I am getting this exception on a date column while writing org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union…

java oracle avro spark-avro jackson-dataformat-avro

asked Jun 15 '20 at 16:29

ankit

1
1
2

0

votes

1 answer

Write from Spark to Kafka in avro format using defined schema?

I have a dataframe that I need to write to Kafka. I have the avro schema defined, similar to this: { "namespace": "my.name.space", "type": "record", "name": "MyClass", "fields": [ {"name": "id", "type": "string"}, …

apache-spark apache-kafka avro spark-avro

asked Jun 15 '20 at 11:01

Mahmoud Hanafy

1,861
3
24
33

0

votes

1 answer

Apache Beam AvroIO read large file OOM

Problem: I am writing an Apache Beam pipeline to convert Avro file to Parquet file (with Spark runner). Everything works well until I start to convert large size Avro file (15G). The code used to read Avro file to create PColletion: …

apache-spark apache-beam parquet spark-avro avroio

asked May 27 '20 at 20:16

fuyi

2,573
4
23
46

0

votes

1 answer

how to force avro writer to write timestamp in UTC in spark scala dataframe

I need to write Timestamp field to avro and ensure the data is saved in UTC. currently avro converts it to long (timestamp millis ) in the Local timezone of the server which is causing issues as if the server reading bk is a different timezone. I…

apache-spark apache-spark-sql avro spark-avro

asked May 23 '20 at 18:46

Ajith Kannan

812
1
8
30

0

votes

1 answer

Writing dataframe to kafka topic in an avro format for spark < 2.4?

Q1. Considering I have a dataframe df and a schema myschema, how do I proceed to write the dataframe into kafka topic in an avro format ? Q2. Is there any optimized way if we do not consider udf ? Most of the available solutions are for spark > 2.4…

apache-spark apache-kafka avro spark-avro

asked May 19 '20 at 10:09

supernatural

1,107
11
34

0

votes

1 answer

Spark not reading all the records from binary file

I am trying to read Avro files from S3 and as shown in this spark documentation I am able to read it fine. My files are like below, these files consist of 5000 record each.…

apache-spark deserialization avro binaryfiles spark-avro

asked May 12 '20 at 22:46

Explorer

1,491
4
26
67

0

votes

1 answer

Generate schema less avro using Spark

Is there a way to generate schema less avro from Apache spark? I can see a way to generate it through Java/Scala using apache avro library and through confluent avro. When I write Avro from Spark in below way, it creates Avro's with schema. I want…

apache-spark apache-spark-sql avro spark-avro avro-tools

asked Apr 21 '20 at 13:11

Explorer

1,491
4
26
67

0

votes

1 answer

Copying avro jars into docker jars directory

I'm learning spark I'd like to use an avro data file as avro is external to spark. I've downloaded the jar. But my problem is how to copy it into that specific place 'jars dir' into my container? I've read relative post here but I do not…

docker apache-spark copy-paste avro spark-avro

asked Apr 17 '20 at 22:57

abdoulsn

842
2
16
32

Questions tagged [spark-avro]