Questions tagged [avro]

Apache Avro is a data serialization framework primarily used in Apache Hadoop.

Apache Avro is a data serialization system.

Features:

  • Rich data structures.
  • A compact, fast, binary data format.
  • A container file, to store persistent data.
  • Remote procedure call (RPC).
  • Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved, since both schemas are present.

When Avro is used in RPC, the client and server exchange schemas in the connection handshake. (This can be optimized so that, for most calls, no schemas are actually transmitted.) Since both client and server both have the other's full schema, correspondence between same named fields, missing fields, extra fields, etc. can all be easily resolved.

Avro schemas are defined with JSON . This facilitates implementation in languages that already have JSON libraries.

Comparison with other systems:

Avro provides functionality similar to systems such as Thrift, Protocol Buffers, etc. Avro differs from these systems in the following fundamental aspects.

  • Dynamic typing: Avro does not require that code be generated. Data is always accompanied by a schema that permits full processing of that data without code generation, static datatypes, etc. This facilitates construction of generic data-processing systems and languages.
  • Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
  • No manually-assigned field IDs: When a schema changes, both the old and new schema are always present when processing data, so differences may be resolved symbolically, using field names.

Available Languages: - - - - - - - -

Official Website: http://avro.apache.org/

Useful Links:

3646 questions
1
vote
1 answer

How do I move avro files to kafka via a kafka connector?

I found the CSV Source Connector which can monitor a directory for files and reads them as CSVs. Is there a Avro Source Connector for avro files? if it is not, Any recommendation for dealing with reading AVRO files to Kafka via Kafka connect?
han
  • 103
  • 7
1
vote
0 answers

Apache Kafka schema registry throws RestClientException

I have two almost similar kafka applications. They both listen to binlog for changes of two tables. My problem is one of them works fine but when try to launch the second one I receive the following…
Hessam
  • 1,377
  • 1
  • 23
  • 45
1
vote
1 answer

How to use Avro serialization with Spring-Kafka

I am trying to learn Kafka and now Avro, To keep consistency between the sender object and receiver object, we keep a JSON schema (.avsc). but I am not able to find any simple example of how to use it. some example is using confluent (is confluent…
shrikant.sharma
  • 203
  • 1
  • 17
  • 37
1
vote
0 answers

Use of thrift/avro for a hadoop job to communicate between Java and C++

Right now we have a Hadoop job in Java that is working with some C++ binaries. We write files to NFS and C++ and Java read them and that is our form of communication, which prevents us from scaling. I'm looking into Proto Buff, Thrift and Avro to…
Meg
  • 131
  • 10
1
vote
0 answers

Can I config Kafka Streams internal topics avro schema subject name stategy?

I develop kafka streams application using stateful operation. I am using kafka topic about avro type. Kafka streams internal topics created during my application runs. schema registy also created internal topics schema. (class…
1
vote
1 answer

Convert JSON to parquet in Java

I am trying to convert JSON to parquet format in Java, but I am getting an exception. Input JSON: {"list": [ {"mainBearingX": 0.178334, "gearBoxZ": 0.03885, "_t": 1560305236290000, "mainBearingZ": 0.034438, …
raj03
  • 445
  • 1
  • 6
  • 19
1
vote
1 answer

Spring Cloud Stream JSON Dead Letter Queue with Avro messages

I am using Spring Cloud Stream with Avro and Confluent Schema Registry. I am using a single DLQ topic for all services, so messages with different schema may land in this topic. I have disabled the dynamic schema registration to ensure an incorrect…
Ali
  • 1,759
  • 2
  • 32
  • 69
1
vote
1 answer

Hive external table from avro files

Is it possible to create an external table on Hive 1.2 from an avro file without specifying the schema and make Hive extract it from data ? I've found this solution but I'm wondering if Hive can extract the schema itself. Thanks
error
  • 926
  • 3
  • 10
  • 19
1
vote
1 answer

Replay messages from dead letter queue in Spring Cloud Stream with Kafka binder

We are using Spring Cloud Stream with Confluent Schema Registry, Avro and Kafka binder. We have configured all our services in the data processing pipeline to use a shared DLQ Kafka topic to simplify the process of exception handling and be able to…
Ali
  • 1,759
  • 2
  • 32
  • 69
1
vote
1 answer

AVRO union type weird serialization format

I've got problem with sending message using Spring-Kafka API to topic on brokers running on confluent platform. I am using AVRO schema along with apache-avro-plugin for maven to generate Java objects from schema. The schema is being registered…
1
vote
0 answers

How to custom serialize/deserialize date field while dealing with AVRO format?

I'm facing a very weird problem when dealing with date fields while serializing and deserializing the data in AVRO format. We have a JPA entity defined as follows @Entity public class Person implements Serializable{ @Column(name = "DOB") …
1
vote
0 answers

Converting a String to a Apache Avro GenericRecord

Can someone give advices on how to convert a String into a GenericRecord? I would like to convert my records to avro and put them into a Kafka topic. I used to have it as a String (I replaced it with GenericRecord). def producerMethod(socket:…
Nika
  • 145
  • 1
  • 13
1
vote
2 answers

How to traverse all Fields in all nested Records in an Avro file and check a certain property in their Types?

I have an avro file which has records, then in their fields (which have uniontypes) there are other records, which also have fields with union types, and some types have a certain property connect.name which i need to check if it equals to…
Alexey Chibisov
  • 188
  • 1
  • 10
1
vote
1 answer

How to deserialize Avro messages in Kafka using C#

Hi I am working Confluent kafka. I have consumer which returns generic record. I want to de-serialize it. I dint find any way. I can do manually each field like object options = ((GenericRecord)response.Message.Value["Product"])["Options"]; I…
1
vote
1 answer

NiFi Avro Kafka message nano-timestamp (19 digits) cast to timestamp with milliseconds

I'm now facing an issue converting Kafka's message record of type long for nano-seconds (19 digits) to a string timestamp with milliseconds. The messages are coming in Avro format and contain different schemas (so we can`t statically define one…
Alexey Chibisov
  • 188
  • 1
  • 10