0

We changed a field to allow null and now previous json don't work anymore returning a AvroTypeException: Unknown union branch.

Here the previous (working) avro file and json for the test: myobject.avsc

{
   "namespace":"my.model.kafka.test",
   "type":"record",
   "name":"MyObject",
   "fields":[
      {
         "name":"First_Level",
         "type":[
            "null",
            {
               "type":"record",
               "name":"FirstLevel",
               "fields":[
                  {
                     "name":"TheTimestamp",
                     "doc":"Timestamp",
                     "type":{
                        "type":"long",
                        "logicalType":"timestamp-micros"
                     }
                  },
                  {
                     "name":"CategoryCode",
                     "type":{
                        "type":"enum",
                        "name":"Code",
                        "symbols":[
                           "A",
                           "B"
                        ]
                     }
                  },
                  {
                     "name":"SecondLevel",
                     "type":{
                        "type":"record",
                        "name":"SecondLevel",
                        "fields":[
                           {
                              "name":"ThirdLevel",
                              "type":{
                                 "type":"array",
                                 "items":[
                                    {
                                       "type":"record",
                                       "name":"ThirdLevel",
                                       "fields":[
                                          {
                                             "name":"LocationCode",
                                             "type":"string"
                                          },
                                          {
                                             "name":"SomeCode",
                                             "type":"string"
                                          },
                                          {
                                             "name":"Cost",
                                             "type":"int"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           }
                        ]
                     }
                  },
                  {
                     "name":"UID",
                     "type":[
                        "null",
                        "string"
                     ],
                     "default":null
                  }
               ]
            }
         ],
         "default":null
      }
   ]
}

Here the json of the test:

{
  "First_Level" : {
    "my.model.kafka.test.FirstLevel" : {
      "TheTimestamp" : 1648808100000000,
      "CategoryCode" : "A",
      "SecondLevel" : {
        "ThirdLevel" : [ {
          "my.model.kafka.test.ThirdLevel" : {
            "LocationCode" : "BBB",
            "SomeCode" : "AAA",
            "Cost" : 2
          }
        }, {
          "my.model.kafka.test.ThirdLevel" : {
            "LocationCode" : "CCC",
            "SomeCode" : "BBB",
            "Cost" : 2
          }
        } ]
      },
      "UID" : "123-9jh789-opi8p83h3"
    }
  }
}

Modification to allow null Here everything work fine, but if we make the SecondLevel nullable by changing the avsc file to the following we get the AvroTypeException: Unknown union branch:

{
   "namespace":"my.model.kafka.test",
   "type":"record",
   "name":"MyObject",
   "fields":[
      {
         "name":"First_Level",
         "type":[
            "null",
            {
               "type":"record",
               "name":"FirstLevel",
               "fields":[
                  {
                     "name":"TheTimestamp",
                     "doc":"Timestamp",
                     "type":{
                        "type":"long",
                        "logicalType":"timestamp-micros"
                     }
                  },
                  {
                     "name":"CategoryCode",
                     "type":{
                        "type":"enum",
                        "name":"Code",
                        "symbols":[
                           "A",
                           "B"
                        ]
                     }
                  },
                  {
                     "name":"SecondLevel",
                     "type":[
                        "null",
                        {
                           "type":"record",
                           "name":"SecondLevel",
                           "fields":[
                              {
                                 "name":"ThirdLevel",
                                 "type":{
                                    "type":"array",
                                    "items":[
                                       {
                                          "type":"record",
                                          "name":"ThirdLevel",
                                          "fields":[
                                             {
                                                "name":"LocationCode",
                                                "type":"string"
                                             },
                                             {
                                                "name":"SomeCode",
                                                "type":"string"
                                             },
                                             {
                                                "name":"Cost",
                                                "type":"int"
                                             }
                                          ]
                                       }
                                    ]
                                 }
                              }
                           ],
                           "default":null
                        }
                     ]
                  },
                  {
                     "name":"UID",
                     "type":[
                        "null",
                        "string"
                     ],
                     "default":null
                  }
               ]
            }
         ],
         "default":null
      }
   ]
}

Which give a

org.apache.avro.AvroTypeException: Unknown union branch ThirdLevel

even if I change the json to include the namespace before the thirdlevel, like in the other stackoverflow answer I get the same error:

org.apache.avro.AvroTypeException: Unknown union branch my.model.kafka.test.ThirdLevel

My question is twofold:

How to modified the avsc so the old json will work and new json that may have the SecondLevel null work too? We need to make this work but ultimately we need to be backward compatible too, so changing name or the json should be avoided.

EDIT:

After running the edited avsc vs kafka data directly the old message and new message were both working perfectly fine. We have a process that save the message in a json files and the json from that process were the one with the problem. Since the backward compatibility was needed only for the kafka consumer only, these change are actually fine.

For those who wonder here how the json should look like after adding the null type to SecondLevel:

{
   "First_Level":{
      "my.model.kafka.test.FirstLevel":{
         "TheTimestamp":1648808100000000,
         "CategoryCode":"A",
         "SecondLevel":{
            "my.model.kafka.test.SecondLevel":{
               "ThirdLevel":[
                  {
                     "my.model.kafka.test.ThirdLevel":{
                        "LocationCode":"BBB",
                        "SomeCode":"AAA",
                        "Cost":2
                     }
                  },
                  {
                     "my.model.kafka.test.ThirdLevel":{
                        "LocationCode":"CCC",
                        "SomeCode":"BBB",
                        "Cost":2
                     }
                  }
               ]
            }
         },
         "UID":"123-9jh789-opi8p83h3"
      }
   }
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Chris
  • 1,080
  • 20
  • 44
  • You dont have a choice to make it backward compatible since now your field item SecondLevel take two types , null and "my.model.kafka.test.SecondLevel", so your JSON has to explicitly define what Type you are using. You are getting "Unknown union branch my.model.kafka.test.ThirdLevel" because JsonDecoders while serializing expects the type to be mentioned which is my.model.kafka.test.SecondLevel – voucher_wolves Jan 31 '23 at 14:11
  • how would the new json look like in thast case? – Chris Jan 31 '23 at 18:06
  • Ok I think I got it but this effectively break backward compatibility: SecondLevel need to be on 2 level. I'll update the question to show the resulting json. – Chris Jan 31 '23 at 18:12

0 Answers0