0

I'm new to Kafka. I'm a little confused about the kafka message format. I tested a KafkaJS consumer.

const run = async () => {
    await kafkaClient.consumer.subscribe({ topic: 'mytopic', fromBeginning: true })
    await kafkaClient.consumer.run({
        eachBatchAutoResolve: false,
        eachBatch: async ({ batch, resolveOffset, heartbeat, isRunning, isStale }) => {
            for (let message of batch.messages) {
                if (!isRunning() || isStale()) break
                processMessage(message)
                resolveOffset(message.offset)
            }
        },
    })
}

I use console.log(message) to see the message format and it's like this

{
  magicByte: 2,
  attributes: 0,
  timestamp: '540669',
  offset: '601953',
  key: <Buffer 39 63 37 23>,
  value: <Buffer 7b 65 32 65 37 38 ... 555 more bytes>,
  headers: {
    'myheader': <Buffer 61  6f>,
  },
  ‧‧‧‧‧‧
}

I also tried a consumer built with spring boot on localhost. As there is no producer, I used Postman to send messages to kafka. The message received by spring boot consumer is like this

{
   body: 'this is body',
   clientIp: 'this is IP
}

'this is body' is the content I sent from Postman. The value of clientIp is my ip.

I notice this is the content returned from message.value.toString() in KafkaJS. Why are they different? Will consumers built with different framework get different message if they connect to the same kafka topic?

What should I try if I want to build a java consumer to receive and consume the same message format as a KafkaJS consumer?

1 Answers1

0

console.log(message) is showing you the whole deserialized message read from Kafka which includes the key, the value and other metadata. How is this data structured in this object is language dependent (e.g. Java or Go will have their own classes or structs that will contain mostly the same data but not necessarily structured in the same way).

Even if each language has a different object to represent a deserialized Kafka message, the metadata (headers, timestamp...), the message key, and the message value are always there, they may be structured differently but its values should always be available through some method or attribute. And When serialized, it always has the same byte representation as defined by the Kafka protocol.

Gerard Garcia
  • 1,554
  • 4
  • 7
  • So, if I connect my spring boot consumer to the same kafka topic as my KafkaJS consumer, it will also receive serialized messages with the same format. Right? – John Smith Jun 01 '22 at 14:16
  • Yep, with the same contents. – Gerard Garcia Jun 01 '22 at 14:19
  • _How is this data structured in this object is language dependent_ - Not exactly. It'll be _serialization framework_ dependent. E.g. JSON, Avro, Protobuf, whatever should have the same byte structure between languages. The Kafka payload itself should always have timestamp, offset, key, value, and headers. That being said, the message key+value bytes should be similar, if not the exact same. – OneCricketeer Jun 01 '22 at 17:51
  • Yeah, maybe I did not explain myself very well. What I meant is that the object, and its structure, that the programming language creates which represents a deserialized message from Kafka, is language dependent, but its contents are the same in any language (at least when it comes to headers, key and value). – Gerard Garcia Jun 01 '22 at 18:00
  • I have edited it hoping it will be more understandable now – Gerard Garcia Jun 01 '22 at 18:07