Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
1
vote
1 answer

Creating objects from Primitive avro schema

Suppose I have a schema in avro like this { "type" : "string" } How should i create object from this schema in java?
1
vote
0 answers

JsonDecoder parsing failing in spark streaming

I am trying to decode a message coming as part of avro message in my spark2.2 streaming. I have a schema defined for this json and whenever the json message comes with out honoring the json schema, my JsonDecoder fails with below error Caused by:…
D P
  • 153
  • 2
  • 12
1
vote
1 answer

SCHEMA REGISTRY KAFKA: how could i integrate it into java project

After going through several lectures on schema registry and looking into how it works, I am more confused than before. I would like to understand how can I include a schema registry in my kafka project where locally we have some producers and some…
1
vote
0 answers

Spark 1.6 - Overwrite directory with avro files failing using dataframes

I have a directory in HDFS which contains avro files. While I try to overwrite the directory with dataframe it fails. Syntax: avroData_df.write.mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("") The error is: Caused by:…
Mnav505
  • 13
  • 3
1
vote
0 answers

How to convert spark streaming Dataset[String] to DataFrame[Row]

I have a non-standard kafka format messages so the code looks like as following val df:Dataset[String] = spark .readStream .format("kafka") .option("subscribe", topic) .options(kafkaParams) .load() .select($"value".as[Array[Byte]]) …
Julias
  • 5,752
  • 17
  • 59
  • 84
1
vote
1 answer

Writing an array of multiple different Records to Avro format, into the same file

We have some legacy file format, which I would need to migrate to Avro storage. The tricky part is that the records basically have some common fields, a discriminator field and some unique fields, specific to the type selected by the…
Peter G. Horvath
  • 535
  • 1
  • 3
  • 15
1
vote
1 answer

How to get the avro schema from StructType

I have a dataFrame Dataset dataset = getSparkInstance().createDataFrame(newRDD, struct); dataset.schema() is returning me a StructType. But I want the actual schema to store in sample.avsc file Basically I want to convert StructType to Avro…
Sumit G
  • 436
  • 8
  • 21
1
vote
2 answers

avro json additional field

I have following avro schema { "type":"record", "name":"test", "namespace":"test.name", "fields":[ {"name":"items","type": {"type":"array", "items": …
ASe
  • 535
  • 5
  • 15
1
vote
1 answer

How to read Avro Encoded kafka message in scala without knowing avro schema?

I need to write a Scala or Java client to read Kafka message from a topic whose messages are Avro encoded and schema changes dynamically. Please suggest a solution to read these messages without writing as Avro file.
Nagaraj Vittal
  • 881
  • 13
  • 26
1
vote
1 answer

Hive on spark. Reading parquet file

I'm trying to read parquet file into Hive on Spark. So I've found out that I should do something kind of that: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES…
Marcel Mars
  • 388
  • 5
  • 16
1
vote
3 answers

Convert org.apache.avro.generic.GenericRecord to org.apache.spark.sql.Row

I have list of org.apache.avro.generic.GenericRecord, avro schemausing this we need to create dataframe with the help of SQLContext API, to create dataframe it needs RDD of org.apache.spark.sql.Row and avro schema. Pre-requisite to create DF is we…
Sagar balai
  • 479
  • 6
  • 13
1
vote
1 answer

Spark CodeGenerator failed to compile, got NPE, infrequently

I'm doing simple spark aggregation operation, reading data from avro file as dataframe and then mapping them to case-classes using rdd.map method then doing some aggregation operation, like count etc. Most of the time it works just fine. But…
Zer001
  • 619
  • 2
  • 8
  • 18
1
vote
0 answers

AvroTypeException: When writing in python3

My avsc file is as follows: {"type":"record", "namespace":"testing.avro", "name":"product", "aliases":["items","services","plans","deliverables"], "fields": [ {"name":"id", "type":"string"…
1
vote
1 answer

IncompatibleSchemaException: Unexpected type VectorUDT when serializing in Avro format

I am using Spark Mllib to generate predictions for my data and then store them to HDFS in Avro format: val dataPredictions = myModel.transform(myData) val output = dataPredictions.select("is", "probability",…
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
1
vote
1 answer

Avro tojson date format

I imported table with selected columns using sqoop to avro file format. Using avro-tools tojson the date appear in strange format (negetive). How can I decode date ? {"first_name":{"string":"Mary"},"last_name": …
moron
  • 69
  • 9