1

I am having a hard time to implement the feature where I can have dynamic message value for Kafka. I am using AvroProducer from confluent-kafka-python along with schema registry. The producer will send message in a format like this :

{'id':1, 'name':'A', 'properties': {'key1': 'value1', 'key2': 'value2', 'key2': 'value3'}},
{'id': 2, 'name': 'X', 'properties': {'key1': 'value1'}} 

The properties can vary between messages. So, some might have more key,value pairs while some might have less. And I am trying to get this message from Kafka to postgresql using kafka connect. I want properties to be json type in postgresql database.

How could this be achieved? Any pointers will be really appreciated. Thanks.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
py_trainee
  • 129
  • 8

1 Answers1

0

It appears that for your example, a single schema that declares an array would suffice. For example in AVRO the schema definition would look like:

{
    "name": "MyRecord",
    "type":"record",
    "fields":[
        {
            "name":"id",
            "type":"long"
        },
        {
            "name":"name",
            "type":"string"
        },
        {
            "name":"properties",
            "type": "array",  
             "items":{
                    "name":"mykvprop",
                    "type":"record",
                    "fields":[
                        {
                          "name":"key", 
                          "type":"string"
                        },
                        {
                          "name":"value", 
                          "type":"string"
                        }
                    ]
                }
        }
    ] 
}

If your messages varied by datatype, then you'd need a more complex solution using either AVRO unions, multi-schema topics, or both. For example if your key and values of the properties array were not all strings. More info:

These advanced cases get complicated if you're using the Kafka Connect API instead of the Kafka Producer API as the Connect API has an internal Schema that appears to limit your options (no union?).

Ryan
  • 7,499
  • 9
  • 52
  • 61