I have a decimal column "TOT_AMT" defined as type "bytes" and logical type "decimal" in my avro schema.
After creating the data frame in spark using databricks spark-avro, when I tried to sum the TOT_AMT column using the sum function it throws…
Using PySpark I'm trying to save an Avro file with compression (preferably snappy).
This line of code successfully saves a 264MB file:
df.write.mode('overwrite').format('com.databricks.spark.avro').save('s3n://%s:%s@%s/%s' % (access_key,…
I am running into this error on trying to load a Avro file (size 134 KB).My pom dependencies are below. I am creating this Avro from a protobuf message which works fine.
pom dependencies…
I am trying to read a large avro file (2GB) using spark-shell but I am getting stackoverflow error.
val newDataDF = spark.read.format("com.databricks.spark.avro").load("abc.avro")
java.lang.StackOverflowError
at…
**I'm trying to stream the data from kafka and convert it in to a data frame.followed this link
But when im running both producer and consumer applications, this is the output on my console.**
(0,[B@370ed56a) (1,[B@2edd3e63) (2,[B@3ba2944d)…
I'm new to Avro. The official documentation indicates that there are two possible approaches to using avro;
With code generation - here classes are auto-generated from avro schema files by the avro compiler. These classes are then used in the…
An sample skeleton code is sort of as follows, where i am basically reading a RDD from bigquery and select out all data point where my_field_name value is null
JavaPairRDD input = sc
…
I created Hive avro table, and trying to read it from pyspark. Basically trying to run basic query over this Hive avro table on pyspark in order to do some analysis.
from pyspark import SparkContext
from pyspark.sql import HiveContext
hive_context…
I am newbie to spark and am trying to load avro data to spark 'dataset' (spark 1.6) using java. I see some examples in scala but not in java.
Any pointers to examples in java will be helpful. I tried to create a javaRDD and then convert it to…
I want to read avro files located in Amazon S3 from the Zeppelin notebook. I understand Databricks has a wonderful package for it spark-avro. What are the steps that I need to take in order to bootstrap this jar file to my cluster and make it…
I have a Spark job (in CDH 5.5.1) that loads two Avro files (both with the same schema), combines them to make a DataFrame (also with the same schema) then writes them back out to Avro.
The job explicitly compares the two input schemas to ensure…
I keep getting
java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper
when calling show() on a DataFrame object. I'm attempting to do this through the shell (spark-shell --master yarn). I can see that the shell recognizes the schema…
I have defined an AVRO schema, and generated some classes with avro-tools for the schemes. Now, I want to serialize the data to disk. I found some answers about scala for this, but not for Java. The class Article is generated with avro-tools, and is…
I am using spark 1.3.0 and spark-avro 1.0.0. my build.sbt file looks like
libraryDependencies ++=Seq(
"org.apache.spark" % "spark-core_2.10" % "1.3.0" % "provided",
"org.apache.spark" % "spark-sql_2.10" % "1.5.2" % "provided",
"com.databricks"…
I have not common problem
I have repository X which contains avro model called Person. In my repository Y, I would like to create a new model with property of type Person from repository X. Is it even possible?
I have imported X artifact to Y but it…