0

I'm trying to use kafka-python for accessing Kafka in a Docker container. The dockerized app from which I'm trying to connect to Kafka is in another container in the same network. The error appears when I try to initialize a KafkaAdminClient object:

self._kafka_admin = KafkaAdminClient(
            bootstrap_servers=server,
            api_version=(0, 10, 2),
            api_version_auto_timeout_ms=120000
        )

And I obtain the next error:

enter image description here

This is the configuration file (docker-compose):

version: '3'
services:
  spark-master:
    image: docker.io/bitnami/spark:2
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    volumes:
      - type: bind
        source: ./conf/log4j.properties
        target: /opt/bitnami/spark/conf/log4j.properties
    ports:
      - '8080:8080'
      - '7077:7077'
    networks:
      - spark
  spark-worker-1:
    image: docker.io/bitnami/spark:2
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://localhost:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    volumes:
      - type: bind
        source: ./conf/log4j.properties
        target: /opt/bitnami/spark/conf/log4j.properties
    ports:
      - '8081:8081'
    networks:
      - spark
    depends_on:
      - spark-master
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - 22181:2181

  kafka:
    image: confluentinc/cp-kafka:latest
    hostname: kafka
    depends_on:
      - zookeeper
    ports:
      - 29092:29092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
  app:
    build:
      context: ./
    ports:
      - 5000:5000
networks:
  spark:
    driver: bridge
Tavis
  • 71
  • 6

1 Answers1

0

Your app is trying to connect to itself when containerized. It needs to point at kafka service, not localhost. One fix if you want to run the code both inside and outside of the container, use an environment variable to define the bootstrap servers

You also cannot have two Kafka advertised listeners on the same ports, so change one of the 29092 to something else. If you change kafka:29092 to kafka:9092, which is probably what's used in the example you copied, then that's also the address your Python app needs to use

Also, pyspark and python-kafka are different, and you don't need both (assuming you are wanting to use your Spark containers)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • If I delete the `localhost:29092` one? I'm trying to use both, pyspark and python-kafka – Tavis Sep 29 '21 at 12:38
  • If you remove that from the Kafka container, you'll still need to change the Python code and you'll no longer be able to use Kafka tools outside any container for debugging – OneCricketeer Sep 29 '21 at 12:49
  • And you don't need both. Pyspark can produce and consume data. If you need something to create topics, you can create a separate "init container" that starts and dies before Spark containers start – OneCricketeer Sep 29 '21 at 12:52
  • Ok I understand. The app works fine in local, I'm trying to deploy it, so if I change the Python code (bootstrap_servers = ["kafka:29092"]) should it work? – Tavis Sep 29 '21 at 13:03
  • Yes, assuming that's the advertised address in the Kafka container, but like I said, using an environment variable would be preferred – OneCricketeer Sep 29 '21 at 13:07