Highest Voted 'spark-avro' Questions

1

vote

1 answer

Avro backward compatibility doesn't work as expected

I have two Avro schema V1 and V2 which are read in spark as below: import org.apache.spark.sql.avro.functions._ val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("./examples/src/main/resources/V1.avsc"))) val df = spark .readStream …

asked Nov 14 '21 at 06:33

darkknight444

546
8
21

1

vote

0 answers

Avro vs ORC memory consumption/performance in Spark

If I have 100GB of avro dataset and I have the same dataset in ORC that is 10GB. If I read in the ORC data in Spark, does it consume less memory than the Avro dataset? I was thinking since all the data gets loaded into memory and deserialized, maybe…

apache-spark avro orc spark-avro

asked Nov 09 '21 at 16:27

Ryan

1,102
1
15
30

1

vote

1 answer

PySpark works in terminal but not when executed in Python code

I am trying to read an avro file type. the following is the sample data source that I have found online to test my code: https://github.com/Teradata/kylo/blob/master/samples/sample-data/avro/userdata1.avro The following is my code (please assume…

python apache-spark pyspark avro spark-avro

asked Jun 15 '21 at 06:18

Zaki Siyaji

73
6

1

vote

1 answer

Serializer for Avro Schema

I am new to Avro Schema. I had created the following schema based upon reference JSON but I am not able to create a serializer for this. { "name": "Name", "type": "record", "namespace": "NameSpace", "fields": [ { "name":…

apache-spark avro spark-avro

asked May 12 '21 at 16:24

chaitanya kumar Dondapati

55
5

1

vote

1 answer

What FileOutputCommitter should be used in when writing AVRO files in Spark?

When saving an RDD to S3 in AVRO, I get the following warning in the console: Using standard FileOutputCommitter to commit work. This is slow and potentially unsafe. I haven't been able to find a simple implicit such as saveAsAvroFile and…

scala apache-spark hadoop avro spark-avro

asked Apr 05 '21 at 13:30

Mridang Agarwalla

43,201
71
221
382

1

vote

1 answer

org.apache.avro.UnresolvedUnionException: Not in union [{"type":"bytes","logicalType":"decimal","precision":18,"scale":4},"null"]: 0.0000

I am trying to read data stored in a hive table in s3, covert it to Avro format and then consume the Avro records to build the final object and push it to a kafka topic. In the object I am trying to publish, I have a nested object that has fields…

java apache-spark amazon-s3 avro spark-avro

asked Jan 14 '21 at 00:18

user1868273

51
2
5

1

vote

1 answer

Spark - Wide/sparse dataframe persistence

I want to persist a very wide Spark Dataframe (>100'000 columns) that is sparsely populated (>99% of values are null) while keeping only non-null values (to avoid storage cost): What is the best format for such use case (HBase, Avro, Parquet, ...)…

apache-spark hbase parquet google-cloud-bigtable spark-avro

asked Jan 09 '21 at 13:09

py-r

419
5
15

1

vote

0 answers

Spark PySpark StructType StructField json to avro

I am trying to read json file and write it do avro. I used PySpark StructType & StructField classes to programmatically specify the schema to the DataFrame. I am trying to read json file and write it to avro format with logicalType set to …

apache-spark pyspark spark-avro

asked Nov 25 '20 at 00:12

losforword

90
7

1

vote

1 answer

How does avro partition pruning work internally?

I have a daily job which converts avro to parquet. Avro file per hour is 20G and is partitioned by year, month, day and hour when I read the avro file like the way below, spark.read.format("com.databricks.spark.avro").load(basePath).where($year=2020…

apache-spark parquet spark-avro

asked Sep 28 '20 at 14:21

Gladiator

354
3
19

1

vote

0 answers

Wrapping StructField to use it in from_avro()

I'm running a test where I create a DataFrame, encode a field using to_avro() and then decode it using from_avro(). The initial (not-encoded) DataFrame's schema has got the StructField instance for the field I encode in Avro. My goal is to use the…

apache-spark spark-avro

asked Sep 08 '20 at 21:30

Vlad.Bachurin

1,340
1
14
22

1

vote

1 answer

copy avro schema of one data frame to another-pyspark

I have a dataset A with schema A , also dataset B with Schema B. Both datasets A and B are mostly similar(have same columns ,but data types are different for only few), but have minor differences.One example being a column in dataset A has date…

pyspark spark-avro

asked Aug 03 '20 at 08:40

chaithanya

11
2

1

vote

1 answer

Can we automate spark SQL query generation from AVRO schema?

I'm working on project where everyday I need to deal with tons of AVRO files. To extract the data from AVRO I use sparkSQL. To achieve this first I need to printSchema and then I need to select the fields to see the data. I want to automate this…

apache-spark apache-spark-sql avro spark-avro

asked Jul 12 '20 at 16:26

Teja

31
7

1

vote

1 answer

How to append data in existing AVRO file using Python

I have a dataframe with similar schema, I need to append the data into the AVRO file. I don't like to add the avro file into folder as a part. For your information, my AVRO file is not into the folder as a part. Can you please help me to solve the…

python-3.x dataframe pyspark avro spark-avro

asked Jun 04 '20 at 14:26

T.SURESH ARUNACHALAM

285
2
14

1

vote

1 answer

Avro - Add doc/description for each field

We are using avro for our schema definition. Is it possible to add field description for each of the fields in avro. I agree that we can add 'doc' at record level. we wanted to add description at a field level.

avro spark-avro avsc

asked May 20 '20 at 11:46

SunilS

2,030
5
34
62

1

vote

1 answer

Spark from_avro function with schema registry support

I am trying to use confluent schema reigstry with sparks from_avro function as per this doc. I have the below imports: "io.confluent" % "kafka-schema-registry-client" % "5.4.1", "io.confluent" % "kafka-avro-serializer" % "5.4.1", "org.apache.spark"…

apache-spark confluent-schema-registry spark-avro

asked Apr 25 '20 at 10:54

irrelevantUser

1,172
18
35

Questions tagged [spark-avro]