2

I have custom function forward_to_kafka(list: List) which sends my events to kafka even if there are problems, it's purpose to deliver all messages in my list. I have already tested it with JB Plugin Big Data Tools and it works correctly.

But now I need to write automated integration tests and I already have this one:

@pytest.mark.parametrize("topic, hits", [
        ("topic1", []),
        ("topic3", ["ev1", "ev2", "ev3"]),
        ("topic2", ["event"]),
        ("topic4", ['{"event": ["arr_elem"]}', '{"event_num": 13}', '{"event": {"subev": "value"}}']),
    ])
    @pytest.mark.integration
    def test_forward_to_kafka_integration(self, topic, hits, output):
        kafka_host = 'localhost:9094'
        producer = KafkaProducer(bootstrap_servers=[kafka_host], acks='all',)
        output.forward_to_kafka(producer, topic, [message.encode() for message in hits])
        consumer = KafkaConsumer(topic, bootstrap_servers=[kafka_host],
                                 group_id=f"{topic}_grp", auto_offset_reset='earliest',
                                 consumer_timeout_ms=1000)
        received_messages = [message.value.decode() for message in consumer]
        print(received_messages)
        assert all([message in received_messages for message in hits])

I have a kafka in docker container, connection is ok, but the 2nd test always fails. To be more correct, if I run empty kafka without saved data, just clear image, the test with first action of pushing always fails.

The result of executing 2nd test below:

FAILED [ 50%][]

test_output.py:90 (TestForwardToKafka.test_forward_to_kafka_integration[topic3-hits1])
self = <tests.test_output.TestForwardToKafka object at 0x107b49520>
topic = 'topic3', hits = ['ev1', 'ev2', 'ev3']
output = <h3ra.output.Output object at 0x107c14b80>

    @pytest.mark.parametrize("topic, hits", [
        ("topic1", []),
        ("topic3", ["ev1", "ev2", "ev3"]),
        ("topic2", ["event"]),
        ("topic4", ['{"event": ["arr_elem"]}', '{"event_num": 13}', '{"event": {"subev": "value"}}']),
    ])
    @pytest.mark.integration
    def test_forward_to_kafka_integration(self, topic, hits, output):
        kafka_host = 'localhost:9094'
        producer = KafkaProducer(bootstrap_servers=[kafka_host], acks='all',)
        output.forward_to_kafka(producer, topic, [message.encode() for message in hits])
        consumer = KafkaConsumer(topic, bootstrap_servers=[kafka_host],
                                 group_id=f"{topic}_grp", auto_offset_reset='earliest',
                                 consumer_timeout_ms=1000)
        received_messages = [message.value.decode() for message in consumer]
        print(received_messages)
>       assert all([message in received_messages for message in hits])
E       assert False
E        +  where False = all([False, False, False])

test_output.py:107: AssertionError

as you see, array with received_messages is empty.

But when I connect with kafka plugin from Big Data Tools, I see that this messages were delivered.

What am I doing wrong?

Edit

To reproduce, you can replace output.forward_to_kafka(producer, topic, [message.encode() for message in hits]) with

for message in hits:
    producer.send(topic, message)
producer.flush()

In common, it's all, what my function do.

There is my docker-compose.yml file

version: "3.9"

services:
  kafka: # DNS-1035
    hostname: kafka
    image: docker-proxy.artifactory.tcsbank.ru/bitnami/kafka:3.5
    expose:
      - "9092"
      - "9093"
      - "9094"
    volumes:
      - "kafka_data:/bitnami"
    environment:
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://kafka:9094
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_PROCESS_ROLES=controller,broker


volumes:
  kafka_data:
    driver: local
shameoff
  • 61
  • 5
  • Kafka docker containers do not start immediately. Are you sure the connection is always successful? – OneCricketeer Jul 31 '23 at 14:12
  • @OneCricketeer, yes, I am sure, I tried several times with different conditions. As I understand, there is something like a bug in kafka-python with first action of write-read. As a workaround I can suggest to use fixture, which will create temp topic and push&read to/from it and then start other tests. But I prefer just to read messages, and if array is empty, repeat the action. – shameoff Aug 01 '23 at 08:39
  • I've never had such as issue with that library. Please share how are you starting the container before the tests. And what is `output` variable? Please show your `forward_to_kafka` function. Refer [mcve] – OneCricketeer Aug 01 '23 at 13:06
  • @OneCricketeer, sure. I added my docker-compose and replace for my func. With this code error is still reproducible, just forget about output and forward_to_kafka. I am sorry, but I can't show exactly my functions – shameoff Aug 02 '23 at 07:34
  • And you just run `docker compose up && python test.py`? Have you tried using testcontainers? https://testcontainers-python.readthedocs.io/en/latest/kafka/README.html – OneCricketeer Aug 02 '23 at 13:31

1 Answers1

0

You need to set the advertised listeners to include EXTERNAL://localhost:9094. You currently have two listeners that are the exact same... and set the mapping 9094:9094 so that a random port isn't used in your host

You can also remove port 9092 and 9093 from your exposed mapping since those would never be used by your host to access the Kafka service

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • yes, you're right, in local tests I do exactly how you described, but this docker-compose is used in my CI, and it works correct. but it doesn't solve the problem. As I said, I have bug ONLY in first reading using kafka-python, there is no problem with reading from big data tools from JB or other consumers – shameoff Aug 03 '23 at 06:41
  • Don't know what to tell you, really. JB extension uses Java, as does the console consumer. That's the only major difference if you are waiting for the container to start before running tests... In a CI server, then "localhost" should not be used as the network address, anyway. Obviously it'll work outside of that because you only have one machine that you run everything on. – OneCricketeer Aug 03 '23 at 17:59