I'm having trouble using kafka for my python code. I use python 2.7.5, and the package kafka-python.
I want to send csv's (300000 rows, 20 fields per row) through a kafka topics. Before that I serialize each row into a json file and up to here, everything works. My Producer sends each row of the files and then close. But on the other side, my consumer doesn't consume anything...
As far as kafka is concerned, I have a single topic with a single partition. My kafka and zookeeper instances are contained in docker containers, but not my consumer or producer.
Here is my code for the producer: ...
def producer(path) :
producer = KafkaProducer(bootstrap_servers="localhost:9092", retries = 5)
with open(path, newline = '', encoding='utf-8-sig') as csvFile :
reader = csv.DictReader(csvFile, fieldnames = dataElements)
for row in reader :
log = process_row(row)
producer.send(topic = TOPIC, value = json.dumps(log).encode())
producer.flush()
producer.close()
print('processing done')
Here is my code for the consumer:
consumer = KafkaConsumer(bootstrap_servers='localhost:9092')
consumer.subscribe(TOPIC)
for message in consumer:
log = json.loads(message.value.decode())
print(log)
consumer.close()
I get 'processing done', after running my producer. I don't get anything when I run my consumer. (I run my consumer first).
I read documentation and It may come from the producer configuration. Still I'm not sure which parameters I should modify (linger_ms, batch_size... ?). It seems to me the default values work in my case.
Any ideas ?