3

How to attach a python consumer script to a particular kafka partition.
On running two instance of the consumer script (given below), each of them randomly picks up one partition and then consumes/prints all messages of that particular partition, As expected.

But as I need to output these messages to a partition named local file on disk, attaching each instance of the script to a pre declared partition ID would make things easier
file name eg. :

Date/Hour/PARTITION_ID-0.CSV
Date/Hour/PARTITION_ID-1.CSV

Any idea on how to achieve that.
Feel free to suggest alternatives approaches.

Kafka Setup:

Topic:my-topic3 PartitionCount:2    ReplicationFactor:2 Configs:
Topic: my-topic3    Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2,1
Topic: my-topic3    Partition: 1    Leader: 1   Replicas: 1,2   Isr: 1,2

Kafka Consumer Script ( in python ) [ WITH FIX ]

from kafka import KafkaConsumer
from kafka import TopicPartition

# To consume latest messages and auto-commit offsets
#consumer = KafkaConsumer('my-topic3',
#                         group_id='my-group',
#                         bootstrap_servers=['192.168.150.80:9092'])

# To consume messages from a specific PARTITION  [ FIX ]
consumer = KafkaConsumer(bootstrap_servers='192.168.150.80:9092')
consumer.assign([TopicPartition('my-topic3', 1)])

for message in consumer:
    # message value and key are raw bytes -- decode if necessary!
    # e.g., for unicode: `message.value.decode('utf-8')`
    print ("Topic= %s : Partition= %d : Offset= %d: key= %s value= %s" % (message.topic, message.partition,
                                          message.offset, message.key,
                                          message.value))

Update : As suggested below , i used assign function, but kept on getting illegal state error
assign function

consumer.assign([TopicPartition('my-topic3',1)])

Error

    Traceback (most recent call last):
  File "consumerExample.py", line 13, in <module>
    consumer.assign([TopicPartition('my-topic3',1)])
  File "/usr/lib/python2.7/site-packages/kafka/consumer/group.py", line 278, in assign
    self._subscription.assign_from_user(partitions)
  File "/usr/lib/python2.7/site-packages/kafka/consumer/subscription_state.py", line 189, in assign_from_user
    raise IllegalStateError(self._SUBSCRIPTION_EXCEPTION_MESSAGE)
kafka.errors.IllegalStateError: You must choose only one way to configure

1 Answers1

3

You can use the assign() method to manually assign one or more partitions to a consumer.

There is some example code here:

>>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer)
ck1
  • 5,243
  • 1
  • 21
  • 25
  • 1
    Thanks for such a quick reply. Already tried the assign function without luck. Getting illegal state error kafka.errors.IllegalStateError: You must choose only one way to configure your consumer: (1) subscribe to specific topics by name, (2) subscribe to topics matching a regex pattern, (3) assign itself specific topic-partitions. –  Jun 25 '16 at 21:58
  • 1
    You need to remove `group_id='my-group'` when you create the consumer, as that represents the group to join for dynamic partition assignment. – ck1 Jun 26 '16 at 15:40
  • Thanks, that works. But now a little confused on my kafka basics. Why cant we have a KafkaConsumer tied to a group and listening to a specific partition ? Am i missing a point here ? –  Jun 28 '16 at 16:29
  • @user5131511: You can; they are not mutually exclusive. See [Scott Carey's](https://stackoverflow.com/users/589907/scott-carey) comments [here](https://stackoverflow.com/questions/53938125/kafkaconsumer-java-api-subscribe-vs-assign#comment109417405_53938125) and [here](https://stackoverflow.com/questions/53938125/kafkaconsumer-java-api-subscribe-vs-assign#comment109417440_53938397). – Olúwátóósìn Anímáṣahun Oct 09 '20 at 17:26
  • @user5131511: Perhaps you also supplied the positional `topics` argument while instantiating `KafkaConsumer`. You shouldn't do that if you intend to call `assign()` on the _consumer_ post initialisation. Providing topics early, otherwise, causes an implicit _subscription_ that's done as part of the initialisation process of the consumer. See the `Raises` section [here](https://kafka-python.readthedocs.io/en/2.0.1/apidoc/KafkaConsumer.html#kafka.KafkaConsumer.assign). – Olúwátóósìn Anímáṣahun Oct 09 '20 at 17:38