Questions tagged [avro]

Apache Avro is a data serialization framework primarily used in Apache Hadoop.

Apache Avro is a data serialization system.

Features:

Rich data structures.
A compact, fast, binary data format.
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved, since both schemas are present.

When Avro is used in RPC, the client and server exchange schemas in the connection handshake. (This can be optimized so that, for most calls, no schemas are actually transmitted.) Since both client and server both have the other's full schema, correspondence between same named fields, missing fields, extra fields, etc. can all be easily resolved.

Avro schemas are defined with JSON . This facilitates implementation in languages that already have JSON libraries.

Comparison with other systems:

Avro provides functionality similar to systems such as Thrift, Protocol Buffers, etc. Avro differs from these systems in the following fundamental aspects.

Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages.
Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names.

Available Languages: c - c++ - c# - java - javascript - julia - php - python - ruby

Official Website: http://avro.apache.org/

Useful Links:

Documentation
Getting Started (Java)
Getting Started (Python)
Specification
API Documentation:
- Java
- C
- C++
- C#
- Julia
IDL Language

3646 questions

votes

6 answers

Generic conversion from POJO to Avro Record

I'm looking for a way to convert a POJO to an avro object in a generic way. The implementation should be robust to any changes of the POJO-class. I have achieved it but filling the avro record explicitly (see example below). Is there a way to get…

java avro

asked Jul 03 '15 at 13:15

Fabian Braun

3,612
1
27
44

votes

1 answer

How to pass parameters for a specific Schema registry when using Kafka Avro Console Consumer?

I am trying to use Confluent kafka-avro-console-consumer, but how to pass parameters for Schema Registry to it?

apache-kafka avro confluent-platform confluent-schema-registry

asked Apr 19 '18 at 18:22

Joe

11,983
31
109
183

votes

3 answers

How to extract schema for avro file in python

I am trying to use the Python Avro library (https://pypi.python.org/pypi/avro) to read a AVRO file generated by JAVA. Since the schema is already embedded in the avro file, why do I need to specify a schema file? Is there a way to extract it…

python schema avro

asked Jul 29 '14 at 00:06

ljxue

votes

3 answers

"The $changeStream stage is only supported on replica sets" error while using mongodb-source-connect

I get an error when running kafka-mongodb-source-connect I was trying to run connect-standalone with connect-avro-standalone.properties and MongoSourceConnector.properties so that Connect write data which is written in MongoDB to Kafka topic. This…

mongodb apache-kafka avro apache-kafka-connect

asked Jan 03 '20 at 00:56

Jaeho Lee

votes

2 answers

Avro multiple record of same type in single schema

I like to use the same record type in an Avro schema multiple times. Consider this schema definition { "type": "record", "name": "OrderBook", "namespace": "my.types", "doc": "Test order update", "fields": [ { …

avro spark-avro

asked Jan 04 '18 at 17:31

Daniel

1,522
1
12
25

votes

1 answer

How to mix record with map in Avro?

I'm dealing with server logs which are JSON format, and I want to store my logs on AWS S3 in Parquet format(and Parquet requires an Avro schema). First, all logs have a common set of fields, second, all logs have a lot of optional fields which are…

avro

asked Sep 18 '15 at 00:45

soulmachine

3,917
4
46
56

votes

1 answer

optional array in avro schema

I'm wondering whether or not it is possible to have an optional array. Let's assume a schema like this: { "type": "record", "name": "test_avro", "fields" : [ {"name": "test_field_1", "type": "long"}, {"name":…

arrays null option-type avro

asked Feb 23 '12 at 17:09

Philipp Pahl

votes

4 answers

How to decode/deserialize Avro with Python from Kafka

I am receiving from a remote server Kafka Avro messages in Python (using the consumer of Confluent Kafka Python library), that represent clickstream data with json dictionaries with fields like user agent, location, url, etc. Here is what a message…

python apache-kafka avro

asked Jun 07 '17 at 08:45

Alexandre Paroissien

votes

6 answers

Deserialize an Avro file with C#

I can't find a way to deserialize an Apache Avro file with C#. The Avro file is a file generated by the Archive feature in Microsoft Azure Event Hubs. With Java I can use Avro Tools from Apache to convert the file to JSON: java -jar…

c# azure hadoop avro

asked Oct 04 '16 at 07:44

Kristoffer Jälén

4,112
3
30
54

votes

3 answers

python Spark avro

When attempting to write avro, I get the following error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 35.0 failed 1 times, most recent failure: Lost task 7.0 in stage 35.0 (TID 110, localhost):…

python apache-spark avro

asked Apr 14 '15 at 04:15

Rolando

58,640
98
266
407

votes

3 answers

How to serialize a Date using AVRO in Java

I'm actually trying to serialize objects containing dates with Avro, and the deserialized date doesn't match the expected value (tested with avro 1.7.2 and 1.7.1). Here's the class I'm serializing : import java.text.SimpleDateFormat; import…

java datetime serialization avro

asked Nov 06 '12 at 16:47

Miguel L.

votes

2 answers

Nesting Avro schemas

According to this question on nesting Avro schemas, the right way to nest a record schema is as follows: { "name": "person", "type": "record", "fields": [ {"name": "firstname", "type": "string"}, {"name": "lastname",…

nested schema record avro

asked Nov 28 '16 at 22:19

Tianxiang Xiong

3,887
9
44
63

votes

1 answer

Why we need Avro schema evolution

I am new to Hadoop and programming, and I am a little confused about Avro schema evolution. I will explain what I understand about Avro so far. Avro is a serialization tool that stores binary data with its json schema at the top. The schema looks…

hadoop avro

asked Aug 25 '16 at 01:45

Anaadih.pradeep

2,453
4
18
25

votes

1 answer

how to mark avro field deprecated in JSON/avsc?

I was looking for method to mark avro field deprecated in a way that generated Java code (getters, and setters for the field) are marked with @Deprecated annotation. Puting @Deprecated into "doc" field doesn't work, because generator puts it into…

java avro

asked Apr 26 '16 at 16:52

tworec

4,409
2
29
34

votes

2 answers

Encode an object with Avro to a byte array in Python

In python 2.7, using Avro, I'd like to encode an object to a byte array. All examples I've found write to a file. I've tried using io.BytesIO() but this gives: AttributeError: '_io.BytesIO' object has no attribute 'write_long' Sample using…

python-2.7 avro

asked May 12 '14 at 16:48

Grant Overby

Prev 1 2

…

99 100 Next