2

Producer serializes the message and send them to Broker in byte arrays. And Consumers deserializes those byte arrays. Broker always stores and passes byte arrays. This is how I understood.

But when you use REST Proxy in Kafka, Producer encodes the message with base64, and Consumer decodes those base64 messages.

A Python example of Producer and Consumer :

# Producer using the REST Proxy

payload = {"records" : 
   [{
        "key":base64.b64encode("firstkey"),
        "value":base64.b64encode("firstvalue")
   }]}
# Consumer using the REST Proxy 

   print "Message Key:" + base64.b64decode(message["key"])

Why do you send message in base64 to the Broker instead of byte arrays? When using REST Proxy, a Broker stores messages in base64 format?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Jin Lee
  • 3,194
  • 12
  • 46
  • 86
  • 1
    JSON doesn't have a `bytes` type. The best you can do is an encoded string. The broker still stores just bytes itself. – OneCricketeer Jul 16 '19 at 18:25
  • @cricket_007 Would you like to put this as an answer? I would be happy to accept yours. ( I'm gonna put mine as well, but will not accept it. Just for people to look at. I'm searching and studying this right now. ) – Jin Lee Jul 17 '19 at 00:19
  • @cricket_007 I have one question. Producer sends base64 encoded messages to Kafka REST Proxy server , and the REST Proxy server changes that to binary and send it to Kafka Cluster. When Rest Consumer Client wants to consume, it goes through the Kafka REST Proxy server, so Proxy server takes binary data and changes that to base64 type send it to Consumer, so Consumer decode that base64. (Sorry for a long question) Did I understand this correctly? :) – Jin Lee Jul 17 '19 at 05:52
  • 1
    Haven't really used rest proxy, but that flow makes sense to me – OneCricketeer Jul 17 '19 at 15:49

2 Answers2

4

enter image description here

When a Producer wants to send a message 'Man', it serializes into bytes (bits). A Broker will store it as 010011010110000101101110. When a Consumer gets this message, it will deserialize back to Man.

However, according to Confluent document :

Data formats - The REST Proxy can read and write data using JSON, raw bytes encoded with base64 or using JSON-encoded Avro.

enter image description here

Therefore, a Producer using REST Proxy will change the message Man into TWFu (base64 encode) and send this to a Broker, and a Consumer using REST Proxy will base64 decode this back to Man.

enter image description here

Jin Lee
  • 3,194
  • 12
  • 46
  • 86
  • There is one part missing to this answer, "When using REST Proxy, a Broker stores messages in base64 format?". The answer is No. Binary data between the REST client and the REST proxy requires base64 encoding. If the REST proxy receives binary/base64 encoded data from the REST producer, it decodes the data back to its binary form and sends this to the kafka broker. When the REST proxy fetches the data from the broker, it encodes it in base64 before sending it to the REST consumer. Using a standard consumer, it will receive the binary data direct from the broker. – Ray Sep 09 '20 at 10:59
0

As you already answered the broker always stores the data in a binary format.

Answering why base 64 is needed instead I found this on the confluent documentation (https://www.confluent.io/blog/a-comprehensive-rest-proxy-for-kafka/):

The necessity of using base64 encoding is more clear when you have to send raw binary data through the Rest Proxy:

If you opt to use raw binary data, it cannot be embedded directly in JSON, so the API uses a string containing the base64 encoded data.

Cr4zyTun4
  • 625
  • 7
  • 18