Questions tagged [avro]

Apache Avro is a data serialization framework primarily used in Apache Hadoop.

Apache Avro is a data serialization system.

Features:

Rich data structures.
A compact, fast, binary data format.
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved, since both schemas are present.

When Avro is used in RPC, the client and server exchange schemas in the connection handshake. (This can be optimized so that, for most calls, no schemas are actually transmitted.) Since both client and server both have the other's full schema, correspondence between same named fields, missing fields, extra fields, etc. can all be easily resolved.

Avro schemas are defined with JSON . This facilitates implementation in languages that already have JSON libraries.

Comparison with other systems:

Avro provides functionality similar to systems such as Thrift, Protocol Buffers, etc. Avro differs from these systems in the following fundamental aspects.

Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages.
Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names.

Available Languages: c - c++ - c# - java - javascript - julia - php - python - ruby

Official Website: http://avro.apache.org/

Useful Links:

Documentation
Getting Started (Java)
Getting Started (Python)
Specification
API Documentation:
- Java
- C
- C++
- C#
- Julia
IDL Language

3646 questions

votes

6 answers

Polymorphism and inheritance in Avro schemas

Is it possible to write an Avro schema/IDL that will generate a Java class that either extends a base class or implements an interface? It seems like the generated Java class extends the org.apache.avro.specific.SpecificRecordBase. So, the…

java avro

asked Jan 01 '14 at 00:14

bsam

1,838
3
20
26

votes

11 answers

Integrating Spark Structured Streaming with the Confluent Schema Registry

I'm using a Kafka Source in Spark Structured Streaming to receive Confluent encoded Avro records. I intend to use Confluent Schema Registry, but the integration with spark structured streaming seems to be impossible. I have seen this question, but…

apache-spark apache-kafka avro confluent-schema-registry spark-structured-streaming

asked Feb 20 '18 at 10:12

Souhaib Guitouni

votes

3 answers

What is the advantage of storing schema in avro?

We need to serialize some data for putting into solr as well as hadoop. I am evaluating serialization tools for the same. The top two in my list are Gson and Avro. As far as I understand, Avro = Gson + Schema-In-JSON If that is correct, I do not see…

java apache hadoop solr avro

asked Dec 12 '13 at 23:25

user2250246

3,807
5
43
71

votes

5 answers

How to encode/decode Kafka messages using Avro binary encoder?

I'm trying to use Avro for messages being read from/written to Kafka. Does anyone have an example of using the Avro binary encoder to encode/decode data that will be put on a message queue? I need the Avro part more than the Kafka part. Or, perhaps…

java apache-kafka avro

asked Nov 28 '11 at 15:40

blockcipher

2,144
4
22
35

votes

5 answers

KafkaAvroSerializer for serializing Avro without schema.registry.url

I'm a noob to Kafka and Avro. So i have been trying to get the Producer/Consumer running. So far i have been able to produce and consume simple Bytes and Strings, using the following : Configuration for the Producer : Properties props = new…

java apache-kafka avro confluent-schema-registry

asked Aug 11 '17 at 12:52

scissorHands

votes

3 answers

How to extract schema from an avro file in Java

How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java. I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking…

java avro avro-tools

asked Aug 04 '17 at 01:09

mba12

2,702
6
37
56

votes

1 answer

Using apache avro reflect

Avro serialization is popular with Hadoop users but examples are so hard to find. Can anyone help me with this sample code? I'm mostly interested in using the Reflect API to read/write into files and to use the Union and Null annotations. public…

java reflection avro

asked Aug 08 '12 at 14:16

fodon

4,565
12
44
58

votes

2 answers

google dataflow job cost optimization

I have run the below code for 522 gzip files of size 100 GB and after decompressing, it will be around 320 GB data and data in protobuf format and write the output to GCS. I have used n1 standard machines and region for input, output all taken care…

python protocol-buffers google-cloud-dataflow apache-beam avro

asked Jan 09 '21 at 12:39

chethi

votes

1 answer

Apache Kafka with Avro and Schema Repo - where in the message does the schema Id go?

I want to use Avro to serialize the data for my Kafka messages and would like to use it with an Avro schema repository so I don't have to include the schema with every message. Using Avro with Kafka seems like a popular thing to do, and lots of…

apache-kafka avro

asked Jul 03 '15 at 10:12

jheppinstall

2,338
4
23
27

votes

2 answers

Does binary encoding of AVRO compress data?

In one of our projects we are using Kafka with AVRO to transfer data across applications. Data is added to an AVRO object and object is binary encoded to write to Kafka. We use binary encoding as it is generally mentioned as a minimal representation…

avro

asked Nov 03 '14 at 09:28

Pal

votes

6 answers

Avro with Java 8 dates as logical type

Latest Avro compiler (1.8.2) generates java sources for dates logical types with Joda-Time based implementations. How can I configure Avro compiler to produce sources that used Java 8 date-time API?

java java-8 avro

asked Aug 16 '17 at 11:22

injecto

votes

3 answers

generating an AVRO schema from a JSON document

Is there any tool able to create an AVRO schema from a 'typical' JSON document. For example: { "records":[{"name":"X1","age":2},{"name":"X2","age":4}] } I found http://jsonschema.net/reboot/#/ which generates a 'json-schema' { "$schema":…

json schema generator avro

asked Jul 03 '14 at 08:51

Pierre

34,472
31
113
192

votes

6 answers

Json String to Java Object Avro

I am trying to convert a Json string into a generic Java Object, with an Avro Schema. Below is my code. String json = "{\"foo\": 30.1, \"bar\": 60.2}"; String schemaLines =…

java json avro

asked Dec 19 '14 at 04:04

Princey James

votes

3 answers

In Java, how can I create an equivalent of an Apache Avro container file without being forced to use a File as a medium?

This is somewhat of a shot in the dark in case anyone savvy with the Java implementation of Apache Avro is reading this. My high-level objective is to have some way to transmit some series of avro data over the network (let's just say HTTP for…

java serialization avro

asked Sep 24 '11 at 08:42

omnilinguist

votes

1 answer

What's the reason behind ZigZag encoding in Protocol Buffers and Avro?

ZigZag requires a lot of overhead to write/read numbers. Actually I was stunned to see that it doesn't just write int/long values as they are, but does a lot of additional scrambling. There's even a loop…

performance protocol-buffers avro zigzag-encoding

asked Nov 26 '15 at 09:46

Endrju

2,354
16
23

Prev 1

…

99 100 Next