0

I am having a kafka producer and consumer in python. I wish to consume messages from kafka producer in batches, let's say 2. From the producer, I have been sending email data like the following:

[{
    "email" : "sukhi215c@gmail.com",
    "subject": "Test 1",
    "message" : "this is a test"
},
{
    "email" : "sukhi215c@gmail.com",
    "subject": "Test 2",
    "message" : "this is a test"   
},
{
    "email" : "sukhi215c@gmail.com",
    "subject": "Test 3",
    "message" : "this is a test"   
},
{
    "email" : "sukhi215c@gmail.com",
    "subject": "Test 4",
    "message" : "this is a test"   
}]

I am trying to consume these data in batches. I wish to consume 2 message at a time and send emails based on those 2 data and consume the next set of data. The workaround that I tried is:

consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer[:2]:
    string = message.value.decode("utf-8")
    dict_value = ast.literal_eval(string)

The error that I am getting is:

    for message in consumer[:2]:
TypeError: 'KafkaConsumer' object is not subscriptable

Can someone help me getting through this?

Suganth
  • 35
  • 1
  • 7

2 Answers2

2

The consumer is not a collection; it's iterator is infinite.

If you want to perform an action every two events, use a counter or your own list

data = []
consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer:
    data.append(message)
    if len(data) >= 2:
        action(data)
        data.clear()
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thank you for the response. I am using multithreading for having two threads, one to consume messages in batches and one to send the emails. I wish to consume the next batch of messages only after the email thread completes sending emails. Any idea how can I achieve that? – Suganth Feb 14 '22 at 05:45
  • If you want to block the consumer, then there's no reason to use separate threads – OneCricketeer Feb 14 '22 at 05:47
  • sorry I was wrong. What I meant is I have to consume messages and send emails simultaneously. will this workaround work for that? – Suganth Feb 14 '22 at 05:49
  • I don't see why it wouldn't. Only downside is if you've got odd number of messages, it'll potentially drop the last message if waiting for too long – OneCricketeer Feb 14 '22 at 05:58
  • But in this one we actually consumed all the messages but we are just processing it in twos. But what I actually want is to consume only 2 datas – Suganth Feb 14 '22 at 10:42
  • Okay, Then call `break` within the if statement – OneCricketeer Feb 14 '22 at 15:49
  • But this method does consume all the messages from kafka in bulk but just processing it in batches right? I wish to consume only 2 messages at a time – Suganth Feb 15 '22 at 04:09
  • That's not possible. The for loop over the consumer always returns one event at a time. Therefore, you must batch the events yourself or break the consumer loop after counting two events – OneCricketeer Feb 15 '22 at 13:42
  • That is my question actually. How actually I can batch the events – Suganth Feb 16 '22 at 03:46
  • I've already answered that. Unclear what else you want me to say. If you want the consumer to poll two "events" at once, then **produce** two of them together so that the consumer reads them as "one" record – OneCricketeer Feb 16 '22 at 14:22
0

Use the poll() interface documented here:

https://kafka-python.readthedocs.io/en/master/_modules/kafka/consumer/group.html#KafkaConsumer.poll

This allows you to set a timeout to return early if there are no messages to consume.

rags
  • 445
  • 5
  • 7