8

I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this :

# Use multiple consumers in parallel w/ 0.9 kafka brokers
# typically you would run each on a different server / process / CPU
 consumer1 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')
  consumer2 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')

Does this mean I can create a separate consumer for each process that I spawn? Also, will there be an overlap on the messages being consumed by consumer1 and consumer2 ?

Thanks

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
red_devil
  • 1,009
  • 2
  • 13
  • 23

1 Answers1

13

Yes, you can create multiple consumers in multiple threads/processes (and even run them in parallel on different machines). As long as all consumers use the same group.id, there will be no overlap. Kafka assigns each topic partition to a single consumer within a consumer group. Be aware, that using more consumers than available topic partitions will result in idle consumers.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • 1
    If you want to process a single partition in parallel, you can now subdivide a Kafka partition and process it as concurrently as you like, using Confluent Parallel Consumer (https://github.com/confluentinc/parallel-consumer), which now also has a Python wrapper (https://github.com/confluentinc/parallel-consumer/pull/443). – Antony Stubbs Oct 25 '22 at 16:57