Highest Voted 'spark-avro' Questions

1

vote

1 answer

How to convert bytes column (with logicaltype as decimal) in Avro to decimal?

I have a decimal column "TOT_AMT" defined as type "bytes" and logical type "decimal" in my avro schema. After creating the data frame in spark using databricks spark-avro, when I tried to sum the TOT_AMT column using the sum function it throws…

asked Mar 06 '17 at 13:07

Anand B

57
3
8

1

vote

1 answer

Enabling Compression on Avro via PySpark

Using PySpark I'm trying to save an Avro file with compression (preferably snappy). This line of code successfully saves a 264MB file: df.write.mode('overwrite').format('com.databricks.spark.avro').save('s3n://%s:%s@%s/%s' % (access_key,…

compression pyspark avro snappy spark-avro

asked Feb 28 '17 at 14:08

Frank B.

1,813
5
24
44

1

vote

0 answers

StackOverflowError while loading Avro file to create a Dataframe

I am running into this error on trying to load a Avro file (size 134 KB).My pom dependencies are below. I am creating this Avro from a protobuf message which works fine. pom dependencies…

apache-spark-sql avro spark-avro

asked Jan 31 '17 at 08:06

Nitin Kumar

219
2
10

1

vote

0 answers

How to Read a large avro file

I am trying to read a large avro file (2GB) using spark-shell but I am getting stackoverflow error. val newDataDF = spark.read.format("com.databricks.spark.avro").load("abc.avro") java.lang.StackOverflowError at…

scala hadoop apache-spark avro spark-avro

asked Dec 29 '16 at 00:11

PrinceChamp

41
1
3

1

vote

0 answers

Not able to convert the byte[] to string in scala

**I'm trying to stream the data from kafka and convert it in to a data frame.followed this link But when im running both producer and consumer applications, this is the output on my console.** (0,[B@370ed56a) (1,[B@2edd3e63) (2,[B@3ba2944d)…

spark-streaming apache-spark-sql kafka-consumer-api kafka-producer-api spark-avro

asked Dec 20 '16 at 11:52

jack AKA karthik

885
3
15
30

1

vote

0 answers

Avro - code-generation approach vs non-code generation approach

I'm new to Avro. The official documentation indicates that there are two possible approaches to using avro; With code generation - here classes are auto-generated from avro schema files by the avro compiler. These classes are then used in the…

java scala serialization avro spark-avro

asked Dec 09 '16 at 21:53

jithinpt

1,204
2
16
33

1

vote

0 answers

read bq table by AvroBigQueryInputFormat from spark give unexpected behavior (using java)

An sample skeleton code is sort of as follows, where i am basically reading a RDD from bigquery and select out all data point where my_field_name value is null JavaPairRDD input = sc …

apache-spark google-bigquery rdd avro spark-avro

asked Dec 08 '16 at 16:51

Xinwei Liu

333
6
15

1

vote

1 answer

Pyspark + Hive avro table

I created Hive avro table, and trying to read it from pyspark. Basically trying to run basic query over this Hive avro table on pyspark in order to do some analysis. from pyspark import SparkContext from pyspark.sql import HiveContext hive_context…

apache-spark pyspark apache-spark-sql spark-avro

asked Dec 06 '16 at 22:05

SuWon

23
1
7

1

vote

1 answer

Read avro data using spark dataset in java

I am newbie to spark and am trying to load avro data to spark 'dataset' (spark 1.6) using java. I see some examples in scala but not in java. Any pointers to examples in java will be helpful. I tried to create a javaRDD and then convert it to…

apache-spark apache-spark-dataset spark-avro

asked Aug 22 '16 at 00:08

Pradeep

850
2
14
27

1

vote

2 answers

Bootstrapping spark-avro jar to Amazon EMR cluster

I want to read avro files located in Amazon S3 from the Zeppelin notebook. I understand Databricks has a wonderful package for it spark-avro. What are the steps that I need to take in order to bootstrap this jar file to my cluster and make it…

amazon-web-services amazon-emr spark-avro

asked Aug 01 '16 at 16:22

van_d39

725
2
14
28

1

vote

1 answer

Spark changes the schema when writing to Avro

I have a Spark job (in CDH 5.5.1) that loads two Avro files (both with the same schema), combines them to make a DataFrame (also with the same schema) then writes them back out to Avro. The job explicitly compares the two input schemas to ensure…

apache-spark avro cloudera-cdh spark-avro

asked Jul 26 '16 at 07:52

DNA

42,007
12
107
146

1

vote

2 answers

NoClassDefFoundError when using avro in spark-shell

I keep getting java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema…

apache-spark apache-spark-sql spark-avro

asked Jun 10 '16 at 18:50

Pudge

98
1
6

1

vote

2 answers

How to serialize the data to AVRO schema in Spark (with Java)?

I have defined an AVRO schema, and generated some classes with avro-tools for the schemes. Now, I want to serialize the data to disk. I found some answers about scala for this, but not for Java. The class Article is generated with avro-tools, and is…

java apache-spark hdfs avro spark-avro

asked Apr 11 '16 at 11:07

Belphegor

4,456
11
34
59

1

vote

1 answer

java.lang.NoClassDefFoundError: com/databricks/spark/avro/package$

I am using spark 1.3.0 and spark-avro 1.0.0. my build.sbt file looks like libraryDependencies ++=Seq( "org.apache.spark" % "spark-core_2.10" % "1.3.0" % "provided", "org.apache.spark" % "spark-sql_2.10" % "1.5.2" % "provided", "com.databricks"…

apache-spark avro spark-avro

asked Mar 18 '16 at 19:26

Knows Not Much

30,395
60
197
373

0

votes

0 answers

Use Avro model from different package in different repository

I have not common problem I have repository X which contains avro model called Person. In my repository Y, I would like to create a new model with property of type Person from repository X. Is it even possible? I have imported X artifact to Y but it…

java maven avro spark-avro avro-tools

asked Aug 19 '23 at 11:50

Mati

1
1

Questions tagged [spark-avro]