2

I want to make a flask application/API with gunicorn that on every request-

  1. reads a single value from a Kafka topic
  2. does some processing
  3. and returns the processed value to the user(or any application calling the API).

So, far I couldn't find any examples of it. So, is the following function is the correct way of doing this?

consumer = KafkaConsumer(
        "first_topic",
        bootstrap_servers='xxxxxx',
        auto_offset_reset='xxxx',
        group_id="my_group")

def get_value_from_topic:
    for msg in consumer:
        return msg

if __name__ == "__main__":
    print(get_value_from_topic())

Or is there any better way of doing this using any library like Faust? My reason for using Kafka is to avoid all the hassle of synchronization among the flask workers(in the case of traditional database) because I want to use a value from Kafka only once.

h s
  • 404
  • 1
  • 5
  • 17

1 Answers1

3

This seems okay at first glance. Your consumer iterator is iterated once, and you return that value.

The more idiomatic way to do that would be like this, however

def get_value_from_topic():
    return next(consumer)

With your other settings, though, there's no guarantee this only polls one message because Kafka consumers poll in batches, and will auto-commit those batches of offsets. Therefore, you'll want to disable auto commits and handle that on your own, for example committing after handling the http request will give you at-least-once delivery, and committing before will give you at-most-once. Since you're interacting with an HTTP server, Kafka can't give you exactly once processing

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thank you for answering. I almost forgot about the batching part. I have found this thread- https://stackoverflow.com/questions/36579815/kafka-python-how-do-i-commit-a-partition and it seems to do what I'm looking for. – h s May 09 '21 at 12:10