0

I'm having trouble using kafka for my python code. I use python 2.7.5, and the package kafka-python.

I want to send csv's (300000 rows, 20 fields per row) through a kafka topics. Before that I serialize each row into a json file and up to here, everything works. My Producer sends each row of the files and then close. But on the other side, my consumer doesn't consume anything...

As far as kafka is concerned, I have a single topic with a single partition. My kafka and zookeeper instances are contained in docker containers, but not my consumer or producer.

Here is my code for the producer: ...

def producer(path) :
    producer = KafkaProducer(bootstrap_servers="localhost:9092", retries = 5)

    with open(path, newline = '', encoding='utf-8-sig') as csvFile :
        reader = csv.DictReader(csvFile, fieldnames = dataElements)
        for row in reader :
            log = process_row(row)
            producer.send(topic = TOPIC, value = json.dumps(log).encode())
    producer.flush()
    producer.close()
    print('processing done')

Here is my code for the consumer:

consumer = KafkaConsumer(bootstrap_servers='localhost:9092')
consumer.subscribe(TOPIC)
for message in consumer:
    log = json.loads(message.value.decode())
    print(log)
consumer.close()

I get 'processing done', after running my producer. I don't get anything when I run my consumer. (I run my consumer first).

I read documentation and It may come from the producer configuration. Still I'm not sure which parameters I should modify (linger_ms, batch_size... ?). It seems to me the default values work in my case.

Any ideas ?

  • If I were you, I'd use Python 3.x as Python 2.7.x is not supported anymore. For your problem, you can use [akhq](https://github.com/tchiotludo/akhq) to see what's going on in your topics. – MetallimaX Jan 21 '21 at 09:04
  • I'd like to, but it's not up to me. Ok I'm going to try this, thanks ! – RxxxxSxxxx Jan 21 '21 at 09:51
  • There are migration scripts available and it should be a priority on any project to migrate since the former Python is not supported anymore. – MetallimaX Jan 21 '21 at 10:32
  • I'll tell that to my supervisor :) So I checked my kafka instances in my docker container. My topic exist and there is effectively 1 partition. I tried consuming through the console-consumer and it did not consume any records... – RxxxxSxxxx Jan 21 '21 at 11:57
  • Sounds more like a Kafka issue to me but not 100% sure. – MetallimaX Jan 21 '21 at 15:47
  • 1
    I found kind of a solution : I created 2 containers (1 for my producer, 1 for my consumer) and it worked (I just changed the ip:ports). So the problem seems to be the connexion. between my Kafka container and my local machine. – RxxxxSxxxx Jan 21 '21 at 15:56

1 Answers1

0

I figured it out using the following contents : https://www.kaaproject.org/blog/kafka-docker https://github.com/wurstmeister/kafka-docker/wiki/Connectivity

It recquire to add some environment variables like KAFKA_ADVERTISED_HOST, in the docker-compose.yml, so that the clients can connect to the kafka broker from outside the docker host.