0

I have one avro file with first schema then I updated the schema that appends to the same file. So now I have two schemas in one file. How does avro handle this scenario. Will I have any new fields add in the file or will I loose any data while reading this data. This is a real time streaming application where I am writing the data to hdfs. My upstream system might update the schema but the hdfs writer might be on old schema. So the hdfs avro file will have two schemas until I update the writer to handle the newer schema.

Note - I don't have schema registry and I am creating one avro file per day. So if a schema is updated in the middle of the day, I will have one avro file with two schemas.

buckeyeosu
  • 45
  • 8

1 Answers1

0

Unlike Thrift Avro doesn't save any meta information about the avro schema in the data.

  1. Avro required avro schema to be present at both write and read time.
  2. Assumption is the schema evolution is compatible and hence reading a older schema with a newer version will not lead to exceptions, but can have null values for new field.
  3. Your evolving schema needs to be backward compatible. Avro provides utility to check for schema compatibility.
  4. As your file may have two different version , but at read time you will provide a version hence the data will be de-serialized to the version you provide at read time.
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
KrazyGautam
  • 2,839
  • 2
  • 21
  • 31