Questions tagged [avro]

Apache Avro is a data serialization framework primarily used in Apache Hadoop.

Apache Avro is a data serialization system.

Features:

Rich data structures.
A compact, fast, binary data format.
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved, since both schemas are present.

When Avro is used in RPC, the client and server exchange schemas in the connection handshake. (This can be optimized so that, for most calls, no schemas are actually transmitted.) Since both client and server both have the other's full schema, correspondence between same named fields, missing fields, extra fields, etc. can all be easily resolved.

Avro schemas are defined with JSON . This facilitates implementation in languages that already have JSON libraries.

Comparison with other systems:

Avro provides functionality similar to systems such as Thrift, Protocol Buffers, etc. Avro differs from these systems in the following fundamental aspects.

Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages.
Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names.

Available Languages: c - c++ - c# - java - javascript - julia - php - python - ruby

Official Website: http://avro.apache.org/

Useful Links:

Documentation
Getting Started (Java)
Getting Started (Python)
Specification
API Documentation:
- Java
- C
- C++
- C#
- Julia
IDL Language

3646 questions

votes

7 answers

JsonMappingException when serializing avro generated object to json

I used avro-tools to generate java classes from avsc files, using: java.exe -jar avro-tools-1.7.7.jar compile -string schema myfile.avsc Then I tried to serialize such objects to json by ObjectMapper, but always got a JsonMappingException saying…

java json serialization avro

asked Sep 06 '16 at 13:04

Amir

votes

1 answer

Avro Schema. How to set type to "record" and "null" at once

I need to mix "record" type with null type in Schema. "name":"specShape", "type":{ "type":"record", "name":"noSpecShape", "fields":[ { "name":"bpSsc", …

null schema union record avro

asked Mar 30 '16 at 23:33

Nadir Novruzov

votes

2 answers

How to convert from GenericRecord to SpecificRecord in Avro for compatible schemas

Is the Avro SpecificRecord (i.e. the generated java classes) compatible with schema evolution? I.e. if I have a source of Avro messages (in my case, kafka) and I want to deserialize those messages to a specificrecord, is it possible to do…

java avro

asked Nov 26 '15 at 18:55

Mark D

5,368
3
25
32

votes

6 answers

Apache Avro: map uses CharSequence as key

I am using Apache Avro. My schema has map type: {"name": "MyData", "type" : {"type": "map", "values":{ "type": "record", "name": "Person", "fields":[ …

java avro

asked Nov 01 '13 at 14:33

Mellon

37,586
78
186
264

votes

1 answer

Avro ENUM field

I am trying to create Union field in Avro schema and send corresponding JSON message with it but to have one of the fields - null. https://avro.apache.org/docs/1.8.2/spec.html#Unions What is example of simplest UNION type (avro schema) with…

null avro

asked May 11 '18 at 01:32

user9750148

votes

1 answer

using AWS Glue with Apache Avro on schema changes

I am new to AWS Glue and am having difficulty fully understanding the AWS docs, but am struggling through the following use case: We have an s3 bucket with a number of Avro files. We have decided to use Avro due to having extensive support for data…

amazon-web-services amazon-s3 avro aws-glue

asked Feb 09 '18 at 20:58

CharStar

votes

3 answers

Where is an Avro schema stored when I create a hive table with 'STORED AS AVRO' clause?

There are at least two different ways of creating a hive table backed with Avro data: Creating a table based on an Avro schema (in this example, stored in hdfs): CREATE TABLE users_from_avro_schema ROW FORMAT SERDE…

hive schema avro metastore

asked May 30 '17 at 07:45

tomek

votes

4 answers

Use schema to convert AVRO messages with Spark to DataFrame

Is there a way to use a schema to convert avro messages from kafka with spark to dataframe? The schema file for user records: { "fields": [ { "name": "firstName", "type": "string" }, { "name": "lastName", "type": "string" } ], "name":…

scala apache-spark apache-kafka spark-streaming avro

asked Aug 20 '16 at 01:30

Sascha Vetter

2,466
1
19
36

votes

1 answer

Avro: deserialize json - schema with optional fields

There are a lot of questions and answers on stackoverflow on the subject, but no one that helps. I have a schema with optional value: { "type" : "record", "name" : "UserSessionEvent", "namespace" : "events", "fields" : [ { "name" :…

json avro

asked Aug 08 '16 at 08:27

Pavel Bernshtam

4,232
8
38
62

votes

3 answers

How to read and write Map from/to parquet file in Java or Scala?

Looking for a concise example on how to read and write Map from/to parquet file in Java or Scala? Here is expected structure, using com.fasterxml.jackson.databind.ObjectMapper as serializer in Java (i.e. looking for equivalent using…

java scala avro parquet

asked Jun 01 '15 at 04:10

okigan

1,559
2
18
33

votes

3 answers

Does Avro schema evolution require access to both old and new schemas?

If I serialize an object using a schema version 1, and later update the schema to version 2 (say by adding a field) - am I required to use schema version 1 when later deserializing the object? Ideally I would like to just use schema version 2 and…

protocol-buffers avro

asked Aug 28 '12 at 18:50

bils

votes

3 answers

Avro Java API Timestamp Logical Type?

With the Avro Java API, I can make a simple record schema like: Schema schemaWithTimestamp = SchemaBuilder .record("MyRecord").namespace("org.demo") .fields() .name("timestamp").type().longType().noDefault() …

java time avro

asked Mar 28 '17 at 22:12

clay

18,138
28
107
192

votes

6 answers

Start Confluent Schema Registry in windows

I have windows environment and my own set of kafka and zookeeper running. To use custom objects, I started to use Avro. But I needed to get the registry started. Downloaded Confluent platform and ran this: $ ./bin/schema-registry-start…

apache-kafka avro kafka-producer-api confluence-rest-api confluent-platform

asked Aug 24 '16 at 17:04

user1860447

1,316
8
25
46

votes

3 answers

Avro schema doesn't honor backward compatibilty

I have this avro schema { "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type":…

java serialization avro

asked Jan 12 '16 at 00:26

Raghvendra Singh

1,775
4
26
53

votes

2 answers

How to generate schema-less avro files using apache avro?

I am using Apache avro for data serialization. Since, the data has a fixed schema I do not want the schema to be a part of serialized data. In the following example, schema is a part of the avro file "users.avro". User user1 = new…

java apache avro

asked Mar 02 '15 at 11:17

mintra

Prev 1 2 3

…

99 100 Next