2

The confluent kafka documentation says, a Consumer class is defined as follows:

Class Consumer<TKey, TValue> 

The consumer class above implements a high-level Apache Kafka consumer (with key and value deserialization).

I understand the TKey and TValue are for deserializing the key, which is sent in from the producer. For example, something like

Sending in a key from the producer would look as

var deliveryReport = producer.ProduceAsync(topicName, key, val);

Receiving the string key on the consumer end would look as

using (var consumer = new Consumer<Ignore, string>(constructConfig(brokerList, false), null, new StringDeserializer(Encoding.UTF8)))
{
    consumer.Subscribe(topics);

    Console.WriteLine($"Started consumer, Ctrl-C to stop consuming");

    var cancelled = false;
    Console.CancelKeyPress += (_, e) => {
        e.Cancel = true; // prevent the process from terminating.
        cancelled = true;
    };

    while (!cancelled)
    {
        Message<Ignore, string> msg;
        if (!consumer.Consume(out msg, TimeSpan.FromMilliseconds(100)))
        {
            continue;
        }

        Console.WriteLine($"Topic: {msg.Topic} Partition: {msg.Partition} Offset: {msg.Offset} {msg.Value}");
    }
}

Since we are passing in a key, the Consumer is initialized as

Consumer<Ignore, string>

and the message is initialized as

Message<Ignore, String>

After all that, my question is, what does deserialization of the key really mean? And why do we need to do that? Also, why do we need to pass in a key-value pair Ignore, String for performing deserialization?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
tubby
  • 2,074
  • 3
  • 33
  • 55

1 Answers1

2

why do we need to pass in a key-value pair Ignore, String for performing deserialization?

You don't need to pass those particular settings. You need to match the settings of the producer. Or, if you're unsure, you would give byte array object for both key and value.

If the producer didn't send a key, such as null, there is nothing to deserialize. I assume that's what the Ignore class is for. Notice you didn't provide a key Deserializer class, but did for the value

null, new StringDeserializer(Encoding.UTF8))

All Kafka messages contain key, value pairs only as bytes. The Producers use serializers, and as a consumer, you need to deserialize. Ideally, you deserialize messages into actual objects, such as strings or JSON objects or Avro, Protobuf, etc. whatever.

By default, the keys are what determines what partitions of a topic the messages you'll be consuming from originated. A null key will be equally distributed across the topic. Otherwise, the producer application can define their own partitioners and send data wherever their logic decides

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • 2
    That's a bit inaccurate. The key doesn't necessarily determine the partition. Kafka producer's default partitioner uses the key for this purpose, but it's completely up to the producer to choose the partition for each message, which could completely ignore the key. – lfk Jun 13 '19 at 02:09
  • When null key is used to produce a message, does it even make sense to set a DEserializer while consuming? Im getting an exception without explicitly setting a deserializer. So the answer is yes but it makes no sense since the key was null. – Boss Man Sep 07 '22 at 19:14
  • @BossMan It's a required value. The deserializer source code may or may not check `bytes == null`, then has the opportunity to return a null `ConsumerRecord` / `Object` or something more null-safe like `Optional` in case of Java. – OneCricketeer Sep 07 '22 at 19:29