2

Using CNCF's Strimzi Kafka Bridge I have created a small API that can interact with Kafka server using a HTTP/1.1 protocol. This is all good for a request-response scenario. However, my requirement is to stream events received on the Kafka topic to the subscribed client(s) (through the Strimzi bridge) as soon as I receive them preferably on a long lived HTTP connection (as per my understanding). It's a waste of client resources to continuously poll the bridge for messages and come back empty handed. I would like the Kafka server stream these events to the client directly.

I am a little unsure about SSE or Websockets or long polling. I did quite a bit of reading on these methodologies to stream data to the client. However, I am unable to figure out if these changes are at the communication or the application layer or both.

Do you just build an API (irrespective of the technology) using a traditional HTTP communication protocol and somehow upgrade it to use Websockets OR use of Websockets should be embedded in your application libraries ground up?

I can provide more information if needed. The Strimzi Kafka bridge website does not mention anything about "server side streaming" OR maybe I am misunderstanding the real purpose of the tool.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Nick
  • 157
  • 2
  • 14

2 Answers2

3

The Strimzi Kafka HTTP bridge is meant as a "translator" for HTTP to Kafka native protocol and vice versa. It means that the HTTP client has to have the same behavior as a native Kafka client so, in the case of a consumer, doing a poll for getting messages which is how Kafka works natively. Imho HTTP 1.1 is not for streaming at all. Websockets is a completely different protocol to which you can upgrade of course starting from an HTTP connection but it's not supported by the Strimzi bridge. Actually, the AMQP 1.0 protocol which is in the bridge (as a POC) can support this kind of scenario so establishing a connection and having the bridge pushing on that connection instead of polling from the client side.

ppatierno
  • 9,431
  • 1
  • 30
  • 45
  • I have a very generic question to your comment. In case of a push architecture, where the Kafka server is pushing events to the client, does it make sense to allow the client (through an API) control on the Kafka offsets for the topic he is subscribed to?. Is the push architecture more of a "Read it or lose it" from a clients perspective. How do servers manage disconnected clients in that case? Or should the servers really care in this architectural pattern. I am very divided between a push and pull pattern. For pjll, Strimzi is good. For push, Pushpin proxy is good. What's the best practice. – Nick Oct 05 '19 at 16:38
  • "Kafka server is pushing events to the client ... " doesn't make sense because Kafka is not for pushing but just for polling. The AMQP protocol easily allows doing push and the bridge takes care of committing or not the offset based on the AMQP client ack of the messages. In AMQP there is also the concept of flow control credit-based where a receiver gives credits to the sender if it can receive message (in the case of the bridge, no credits means stopping the Kafka consumer on the bridge). Push vs pull depends on the use case. – ppatierno Oct 06 '19 at 08:23
  • Fair enough. Kafka server pushing events to the client is an impression of what the receiver gets without worrying about the intermediate technologies used to achieve that state. Even in a pull architecture, I am trying to figure out how does the receiver receive all messages on the topic without prematurely closing the HTTP connection. I need to look a little deeper into it. – Nick Oct 06 '19 at 14:13
  • Actually it could be a feature to add on the bridge or thinking more about it ... supporting HTTP long polling. Currently, when the client sends an HTTP request to poll, the bridge returns a response even if no records are available. Maybe the bridge could avoid to send a response until records are available or a timeout expires, it could be something configurable. – ppatierno Oct 06 '19 at 15:42
  • That would be a wonderful feature to add. Thanks for the good discussions. – Nick Oct 06 '19 at 23:47
  • Using this bridge, does the client have the ability to seek data from a specific offset. I checked the API and couldn't find anything. Maybe I missed it. Also, is there an API to commit offsets? I tried " POST /consumers/{groupid}/instances/{name}/offsets" and couldn't get it to work. Is the "groupid", in this case, same as the consumer group name one creates while subscribing to the bridge? – Nick Nov 04 '19 at 18:24
  • /consumers/{groupid}/instances/{name} is the URL you get when you create a consumer. It's in the JSON and the field is base_uri actually, it identifies the consumer on the bridge to interact with. bout seek you have the /positions, /positions/beginning and /positions/end for seeking to an offset and then consuming; search for it on the doc. Regarding offset what does it mean that it doesn't work? what do you get? – ppatierno Nov 04 '19 at 20:11
1

@Nick thinking more, actually you can do "long polling". The GET on the /records endpoint for getting messages has a timeout parameter on the query string. Its value is used as timeout for the internal native Kafka poll in the bridge. It somehow provides you the long polling behaviour because the poll doesn't return until there are available records or the timeout expires. If you set a high timeout, you can have the behavior you want avoiding polling more times with opening/closing more HTTP connections for that. More details on the timeout parameter here:

https://strimzi.io/docs/bridge/latest/#_poll

ppatierno
  • 9,431
  • 1
  • 30
  • 45
  • Wonderful. Will give that a try – Nick Oct 07 '19 at 11:25
  • I tried setting the "consumer.request.timeout.ms" to a max value allowed as per the integer data type during a consumer instance subscription. While being subscribed, I tried to post messages on the Kafka topic. I wasn't able to consume the messages through the bridge without resending the GET request. The long polling does not work without resending the GET request. Is my understanding of "long polling" in this context correct? – Nick Oct 08 '19 at 02:57
  • @Nick that's quite strange I just tried it and it worked. Start the bridge, using Postman to create a consumer, subscribe to "test" topic (which is empty at the beginning) and calling a poll with a GET on this URL http://localhost:8080/consumers/my-group/instances/consumer1/records?timeout=60000 (so specifying 60 secs as timeout). The request got stuck until after 10 seconds I sent a message to that topic. It "unlocks" the poll and the record is returned to the consumer. – ppatierno Oct 08 '19 at 07:23
  • Thanks for testing. I will retry again. I also used Postman to subscribe. I will retest and verify. Thanks for all your help around this. I have a few more questions around the bridge. Does it have to be containerized for Production or I can use it on a VM? Are there any documented settings for connecting to a secure Kafka cluster? Have you tried putting a Load Balancer in front of this? Did it scale well? I probably would need some sort of API gateway fronting this as well. E.g Apigee or Layer 7. Truly appreciate your response and all the help. – Nick Oct 08 '19 at 12:30
  • The bridge can run even on VM so not containerized and not on Kubernetes. In order to connect to the secure cluster you have just to use the application.properties file adding the Kafka configuration parameter for producer and consumer (so trustsstore if TLS, authentication, etc etc). You have to set parameters starting with prefix `kafka.producer` and `kafka.consumer`. Finally, how the consumer side works, it is not simple to scale the bridge but I did an integration with 3scale API you can find here: https://github.com/strimzi-incubator/strimzi-kafka-bridge-api – ppatierno Oct 08 '19 at 13:13
  • Is the KafkaBridge schema reference in the Kafka Bridge documentation available only if the bridge is deployed as a Docker Container or it's available if I use it on a virtual machine? – Nick Nov 04 '19 at 19:59
  • by schema reference do you mean the endpoints exposed by the bridge? Anyway the bridge can run even in a VM not as container. The readme in the repo explains how it's possible. – ppatierno Nov 04 '19 at 20:13
  • The reason I ask is because I set auto offset commit to "false" in the application.properties file and restarted the bridge. While subscribing a consumer, I removed the "enable.auto.commit": false and "auto.offset.reset": "earliest" properties so that the auto commit could be handled manually by the client. However, when I use the "POST /consumers/{groupid}/instances/{name}/positions" with the correct details, I do get a 204, which tells me that somewhere the offsets are auto committed instead of manually allowing the client to commit. I would like to reread some data from previous offsets. – Nick Nov 04 '19 at 20:42
  • The 204 on that endpoint says you that the seek operation went fine. You should be able to get messages from the offset where you seeked. Why are you saying that it says to you that auto commit is enabled? Btw instead of writing here, I would open an issue on the bridge Github repo describing better the steps and your scenario. It is difficult to follow here :) – ppatierno Nov 04 '19 at 20:51
  • Agreed.. Can you please point e to the bridge github repo so that I accidently don't post this somewhere else.? – Nick Nov 04 '19 at 20:59
  • @Nick did you open the issue? I cannot see it. – ppatierno Nov 05 '19 at 06:28
  • I did issue number 368. I appreciate your reply on that. I will respond to it shortly. Thanks for all your help. – Nick Nov 05 '19 at 12:57
  • but it sounds completely different from what you were asking here :-) – ppatierno Nov 05 '19 at 13:00
  • Yes, because I answered my own question while testing. :). But I have a few more questions, which I will ask on the github page. I hope that's ok – Nick Nov 05 '19 at 14:42