0

I'm trying to create a script to deserialize some Avro messages that comes from Kafka.

The messages have a format like:

{
  "value": {
    "value1": {
      "string": "AAAA"
    }
  }
}

and I need it to be something like that

{
  "value": {
    "value1":  "AAAA"
  }
}

Basically, delete that string.

I have schemas for both of them.

I need to move from the message that is serialized with a schema to a message that is deserialized with another schema.

I tried to do something with python avro/fastavro, but I didn't succed.

I can not just delete that and format because the Avro that I need to reformat are much more complex. So, I need something that will reformat these avros based on my schemas.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Mister
  • 1
  • 1
  • Please clarify why you "need" the second format. You might get deserializer errors if the second object is read by a union schema. You're not just "deleting a string", you're changing a JSON object structure into a string, so depending on the parsing logic, that may not be expected. You "tried to do something"? Please clarify this too – OneCricketeer Sep 07 '22 at 13:23

1 Answers1

0

I can't tell from the question if you are trying to convert between schemas or just remove that string hint. If you are just trying to remove the hint, you can do this:

from fastavro import json_reader, json_writer

schema = {...}

with open('some-file', 'r') as fo:
    avro_reader = json_reader(fo, schema)
    records = [record for record in avro_reader]

with open('some-other-file', 'w') as out:
    json_writer(out, schema, records, write_union_type=False)
Scott
  • 1,799
  • 10
  • 11