0

I'm looking for a way to display my API (localhost) to my docker using kafka.

My producer (below) works like a charm. I know because when i print res.text, I have an output.

import json
import requests
from kafka import KafkaProducer
import time

# get data
res = requests.get('http://127.0.0.1:5000/twitter')
#print(res.text)

# use kafka

producer = KafkaProducer(bootstrap_servers=['localhost:9092'])#, api_version='2.0.0')
producer.send('test', json.dumps(res.text).encode('utf-8'))
time.sleep(1)
#producer.flush()

However, my Consumer doesn't work. Here is what i have tried so far.

Currently stopped at the for loop.
import kafka
import json
import requests
from kafka import KafkaConsumer

# utiliser kafka
consumer = KafkaConsumer('test', bootstrap_servers=['localhost:9092'], api_version='2.0.0', group_id="test_id", value_deserializer = json.loads)
print('before for ')
consumer.subscribe('test')
for msg in consumer:
    print('IN for')
    #print(type(consumer))
    print(json.loads(msg.value.decode()))
#print(consumer)

I'm missing something somewhere, but I can't figure what.

When I manually stop, I get the following error from docker :

<class 'kafka.consumer.group.KafkaConsumer'>
^CTraceback (most recent call last):
  File "consumer.py", line 11, in <module>
    for m in consumer:
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 1193, in __next__
    return self.next_v2()
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 1201, in next_v2
    return next(self._iterator)
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 1116, in _message_generator_v2
    record_map = self.poll(timeout_ms=timeout_ms, update_offsets=False)
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 655, in poll
    records = self._poll_once(remaining, max_records, update_offsets=update_offsets)
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 680, in _poll_once
    self._update_fetch_positions(self._subscription.missing_fetch_positions())
  File "/usr/lib/python3.7/site-packages/kafka/consumer/group.py", line 1112, in _update_fetch_positions
    self._fetcher.update_fetch_positions(partitions)
  File "/usr/lib/python3.7/site-packages/kafka/consumer/fetcher.py", line 186, in update_fetch_positions
    self._reset_offset(tp)
  File "/usr/lib/python3.7/site-packages/kafka/consumer/fetcher.py", line 237, in _reset_offset
    offsets = self._retrieve_offsets({partition: timestamp})
  File "/usr/lib/python3.7/site-packages/kafka/consumer/fetcher.py", line 302, in _retrieve_offsets
    time.sleep(self.config['retry_backoff_ms'] / 1000.0)
KeyboardInterrupt
version: "3.7"
services:

  spark-master:
    image: bde2020/spark-master:3.0.1-hadoop3.2
    ports:
      - "8080:8080"
      - "7077:7077"
    volumes:
       - ./work:/home/jovyan/work
    environment:
       - "SPARK_LOCAL_IP=spark-master"

  spark-worker:
    image: bde2020/spark-worker:3.0.1-hadoop3.2

    depends_on:
      - spark-master
    environment:
      - SPARK_MASTER=spark://spark-master:7077
      - SPARK_WORKER_CORES=2
      - SPARK_WORKER_MEMORY=3G
      - SPARK_DRIVER_MEMORY=2G
      - SPARK_EXECUTOR_MEMORY=2G
    volumes:
       - ./work:/home/jovyan/work

  pyspark-notebook:
    image: jupyter/pyspark-notebook
    container_name: pyspark_notebook
    ports:
      - "8888:8888"
    volumes:
      - ./work:/home/jovyan/work
      - ./work/model:/tmp/model_prediction
    environment:
      - PYSPARK_PYTHON=/usr/bin/python3
      - PYSPARK_DRIVER_PYTHON=ipython3

  zookeeper:
    image: wurstmeister/zookeeper:3.4.6
    expose:
    - "2181"

  kafka:
    image: wurstmeister/kafka:2.11-2.0.0
    depends_on:
    - zookeeper
    ports:
    - "9092:9092"
    expose:
    - "9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE

  mongo:
    image: mongo
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example

  mongo-express:
    image: mongo-express
    restart: always
    ports:
      - 8081:8081
    environment:
      ME_CONFIG_MONGODB_ADMINUSERNAME: root
      ME_CONFIG_MONGODB_ADMINPASSWORD: example


Could you please help me?

  • Why did you comment the producer flush?. Just because you can print the http response doesn't mean the producer succeeded – OneCricketeer Apr 03 '21 at 14:56
  • Hi, thanks for your answer and help. How can I check if the producer succeeded. For me, the print was the reason why it worked. I comment the flush because I didn't saw the use of it. – textSolver34761 Apr 03 '21 at 16:53
  • I have uncommented the flush method (in producer) and uncommented the for loop (in my consumer): `for msg in consumer print(json.load(msg.value.decode()))`.... nothing changed ! – textSolver34761 Apr 03 '21 at 17:12
  • Do other tools work to consume? For example, built in Kafka console consumer? Kafkacat? There's also an offsetshell tool that you can use to verify that there's an offset difference in the topic... In other words, I'm still not convinced your producer worked – OneCricketeer Apr 04 '21 at 15:11
  • Hi, I actually don't know how to see / test if my producer works or not. My teacher (who helped me build the producer) told me it does. Do you have the tools name i could use? I'd like to test my producer. – textSolver34761 Apr 05 '21 at 17:26
  • I also did a print() of the data i receive and that it's being processed. I mean I print before starting print('starting process') and at the end, print(''end of process). I get both prints in the shell cmd – textSolver34761 Apr 05 '21 at 17:28
  • I only use Kafka to consume. If you have other tools I could test, I'd be willing to test! – textSolver34761 Apr 05 '21 at 17:33
  • I don't understand what you mean "use Kafka to consume"... You have access to a broker, and can use the built-in `kafka-console-consumer` script that comes with it (refer official Kafka website). Secondly, I suggest you use the exact same bootstrap address between both clients; it's unclear what `kafka:9092` refers to, but it's probably not the same machine as `localhost:9092`. And again, just because you are printing a requests response doesn't mean anything for KafkaProducer – OneCricketeer Apr 06 '21 at 13:22
  • you asked me "Do other tools work to consume? For example, built in Kafka console consumer". No, I use Kafka. I don't use the console. localhost:9092 refers to the docker-compose file, when it's from the outside. kafka:9092 refers to the docker-compose, when it's from the inside. https://stackoverflow.com/questions/65823835/kafka-python-producer-send-record-but-consumer-dont-receive-it – textSolver34761 Apr 07 '21 at 16:36
  • My teacher doesn't think it's a port issue (link below). – textSolver34761 Apr 07 '21 at 16:39
  • why did you uncomment all my code? – textSolver34761 Apr 07 '21 at 16:40
  • Obviously it doesn't work when things are commented. The broker *comes with* consumer tools. I'm asking you to [edit] your question to prove they work before helping with your Python code. If you're using Docker, you need to `docker exec` into the container first. You've also not added your compose file, so I'm not sure how you've configured anything, but your producer and consumer code are clearly using different addresses – OneCricketeer Apr 08 '21 at 01:56
  • fwiw, here's a very similar question to yours https://stackoverflow.com/q/66992375/2308683 – OneCricketeer Apr 08 '21 at 14:10
  • Yes, I know that my producer and consumer code are clearly using different addresses : one is local with an endpoint in "/twitter", the other is internal at Kafka. – textSolver34761 Apr 08 '21 at 16:41
  • No, no, the bootstrap servers were different. Not your requests call; you can't put /twitter on the Kafka address... Thanks for the update, but now that I see you're using pyspark container, it's not clear why you're not using the Spark-Kafka libraries rather than `kafka-python`, not that it really matters but it's hard to tell where your Python code is actually ran - Inside a container or not? – OneCricketeer Apr 09 '21 at 02:40
  • 1
    My program is supposed to work like follow : the producer has to link to the API (http://127.0.0.1:5000/twitter) to get the data. It works from OUTSIDE of the container. Then, the producer passes the data to the consumer. The consumer is located INSIDE the container. – textSolver34761 Apr 10 '21 at 13:18
  • Whatever I do, my programm stops each and every time at the ```for msg in consumer```. Is there a way to convert the consumer in a list ? I lookup and didn't find anything. – textSolver34761 Apr 10 '21 at 13:21
  • producer returns kafka.producer.future.FutureRecordMetadata – textSolver34761 Apr 10 '21 at 13:57
  • Try `KafkaConsumer('test', bootstrap_servers=['kafka:9093'],` – OneCricketeer Apr 10 '21 at 14:09
  • ```import kafka import json import requests from kafka import KafkaConsumer # utiliser kafka consumer = KafkaConsumer('test', bootstrap_servers=['kafka:9093'], api_version='2.0.0', group_id="test_id", value_deserializer = json.loads, max_poll_records = 1000) print('before for ') for msg in consumer: print('IN for') #print(type(consumer)) print(json.loads(msg.value.decode())) #print(consumer) ``` but the program still stops at ```print('before for ') for msg in consumer ```. I don't understand why. – textSolver34761 Apr 11 '21 at 13:16
  • When I try to do a print(json.loads(consumer)) before the for loop, It raises an error the JSON object must be str, bytes or bytearray, not KafkaConsumer. Is it possible to put KafkaConsumer as a List? – textSolver34761 Apr 11 '21 at 13:30
  • My consumer is "empty". no partitions, or replicas. Could that be the issue. Just a topic named 'test', with defaut configurations. Could that be the issue? – textSolver34761 Apr 11 '21 at 13:36
  • `consumer` is not a string, so cannot be `json.loads`'d. It's an infinite iterator, so cannot be made a list, which would be with `[msg.value for msg in consumer]` if it were. Consumers don't have replicas. If you mean the topic is empty and has no replicas then yes that's a problem, and means the producer is not working like I've asked you to show – OneCricketeer Apr 11 '21 at 14:28
  • I thought I showed you that the producer was working like expected because it prints when I do : print(producer.send('test', json.dumps(res.text).encode('utf-8'))) – textSolver34761 Apr 11 '21 at 14:52
  • I found this : https://stackoverflow.com/questions/55537766/python-producer-can-send-via-shell-but-not-py i'm gonna follow the documentation – textSolver34761 Apr 11 '21 at 15:08
  • So I followed what you have told the guy about the future stuff, and with my producer.flush() I get a ```after flush ``` – textSolver34761 Apr 11 '21 at 15:28
  • I'm not sure why you're printing the producer, but if you want to show the record succeeded, then you'd add a callback, as documented here - https://github.com/Berkodev/kafka-python/blob/master/docs/usage.rst#kafkaproducer , ref `on_send_success` – OneCricketeer Apr 11 '21 at 15:34

2 Answers2

0

Same docker compose

From host

Create a topic

$ docker-compose up -d
$ docker-compose exec kafka /opt/kafka/bin/kafka-topics.sh --create --topic test --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1      
Created topic "test".
$ docker-compose exec kafka /opt/kafka/bin/kafka-topics.sh --list --zookeeper zookeeper:2181                            
test

Verify API is running

$ curl -H 'Content-Type: application/json' localhost:5000/twitter
{"tweet":"foobar"}

Install kafka-python and run producer (with uncommented flush)

$ pip install requests kafka-python
$ python producer.py

Verify data landed in topic

$ docker-compose exec kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
"{\"tweet\":\"foobar\"}\n"

From inside the container

Using pyspark notebook @ http://localhost:8888

Open a terminal tab

$ conda install kafka-python
(base) jovyan@3eaf696e1135:~$ python work/consumer.py
before for
IN for
{"tweet":"foobar"}

New consumer code

import kafka
import json
import requests
from kafka import KafkaConsumer

# utiliser kafka
consumer = KafkaConsumer('test',
    bootstrap_servers=['kafka:9093'],  # needs to be the kafka INSIDE:// listener address
    api_version='2.0.0',
    group_id="test_id",
    auto_offset_reset='earliest',  # you're missing this
    value_deserializer=json.loads)
print('before for ')
for msg in consumer:
    print('IN for')
    #print(type(consumer))
    print(msg.value)
#print(consumer)
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • tl;dr - by default consumer reads from the end of the topic, and you apparently were not running the producer _after_ the consumer loop was waiting for data – OneCricketeer Apr 11 '21 at 15:13
  • Hi, thanks for your answer. There is not any data is the topic. A topic was created, and the API is running. ```127.0.0.1 - - [11/Apr/2021 17:56:25] "←[37mGET /twitter HTTP/1.1←[0m" 200 -``` – textSolver34761 Apr 11 '21 at 16:00
  • The only thing I changed for the producer from what you posted in the question is uncommenting the `producer.flush()`, so I'm not sure where else the issue could be – OneCricketeer Apr 11 '21 at 16:04
  • 1
    Ok Thanks, I'll have a call with my teacher and let you know if I found. Many thanks for the time you gave me! – textSolver34761 Apr 11 '21 at 16:11
0

I found what was wrong...
The docker image wasn't working.
I changed and it's working.

I made my own dockerfile.