11

I found this docker image for Kafka

https://hub.docker.com/r/spotify/kafka/

and I can easily create a docker container using command documented in the link

docker run -p 2181:2181 -p 9092:9092 --env ADVERTISED_HOST=`boot2docker ip` --env ADVERTISED_PORT=9092 spotify/kafka

This is good. But I want to configure a "multiple" node Kafka cluster running on a docker swarm.

How can I do that?

Knows Not Much
  • 30,395
  • 60
  • 197
  • 373

4 Answers4

7

Edit 28/11/2017:

Kafka added listener.security.protocol.map to their config. This allows you to set different listener addresses and protocols depending on whether you are inside or outside the cluster, and stops Kafka getting confused by any load balancing or ip translation which occurs in docker. Wurstmeister has a working docker image and example compose file here. I tried this a while back with a few docker machine nodes set up as a swarm and it seems to work.

tbh though I just attach a Kafka image to the overlay network and run the Kafka console commands when ever I want to interact with it now.

Hope that helps


Old Stuff Below

I have been trying this with docker 1.12 using docker swarm mode

create nodes

docker-machine create -d virtualbox  master
docker-machine create -d virtualbox  worker
master_config=$(docker-machine config master | tr -d '\"')
worker_config=$(docker-machine config worker | tr -d '\"')
master_ip=$(docker-machine ip master)
docker $master_config swarm init --advertise-addr $master_ip --listen-addr $master_ip:2377
worker_token=$(docker $master_config swarm join-token worker -q)
docker $worker_config swarm join --token $worker_token  $master_ip:2377
eval $(docker-machine env master)

create the zookeeper service

docker service create --name zookeeper \
    --constraint 'node.role == manager' \
    -p 2181:2181 \
    wurstmeister/zookeeper

create the kafka service

docker service create --name kafka \
    --mode global \
    -e 'KAFKA_PORT=9092' \
    -e 'KAFKA_ADVERTISED_PORT=9092' \
    -e 'KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092' \
    -e 'KAFKA_ZOOKEEPER_CONNECT=tasks.zookeeper:2181' \
    -e "HOSTNAME_COMMAND=ip r | awk '{ ip[\$3] = \$NF } END { print ( ip[\"eth0\"] ) }'" \
    --publish '9092:9092' \
    wurstmeister/kafka

Though for some reason this will only work from within the ingress or user defined overlay network and the connection will break to Kafka if you try and connect to it through one of the guest machines.

Changing the advertised IP doesn't make things any better...

docker service create --name kafka \
    --mode global \
    -e 'KAFKA_PORT=9092' \
    -e 'KAFKA_ADVERTISED_PORT=9092' \
    -e 'KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092' \
    -e 'KAFKA_ZOOKEEPER_CONNECT=tasks.zookeeper:2181' \
    -e 'KAFKA_LOG_DIRS=/kafka/kafka-logs' \
    -e "HOSTNAME_COMMAND=curl 192.168.99.1:5000" \
    --publish '9092:9092' \
    wurstmeister/kafka

I think the new mesh networking and load balancing in docker might be interfering with the Kafka connection some how....

to get the host container I have a flask app running locally which I curl

from flask import Flask
from flask import request

app = Flask(__name__)

@app.route('/')
def hello_world():
    return request.remote_addr
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Richard Mathie
  • 306
  • 3
  • 6
  • When I used the above to run kafka and zookeeper, then the kafka process fails saying it doesn't recognize the host zookeeper or tasks.zookeeper.. Any idea what I'm doing wrong? – Arshad Ansari Nov 27 '17 at 08:36
  • Sorry, I fixed it by adding the overlay network which was missing. It requires a new overlay network to be created and --network option to add that new newtork to the service command shown above – Arshad Ansari Nov 27 '17 at 09:21
  • Ah man this is old. Kafka has added some protocols to get this working, and docker is generally better now. Just follow the documentation and tutorials from this guy here: [wurstmeister/kafka](https://hub.docker.com/r/wurstmeister/kafka/) and [docker-compose-swarm.yml](https://github.com/wurstmeister/kafka-docker/blob/master/docker-compose-swarm.yml) – Richard Mathie Nov 28 '17 at 10:35
3

The previous approach raise some questions:

  1. How to specify the IDs for the zookeeper nodes?
  2. How to specify the id of the kafka nodes, and the zookeeper nodes?

#kafka configs echo "broker.id=${ID} advertised.host.name=${NAME} zookeeper.connect=${ZOOKEEPERS}" >> /opt/kafka/config/server.properties

Everything should be resolvable in the overlay network.

Moreover, in the issue Cannot create a Kafka service and publish ports due to rout mesh network there is a comment to don't use the ingress network.

I think the best option is to specify your service by using a docker compose with swarm. I'll edit the answer with an example.

Fabio Fumarola
  • 430
  • 3
  • 9
2

There are 2 concerns to consider: networking and storage.

Since Kafka is stateful service, until cloud native storage is figured out, it is advisable to use global deployment mode. That is each swarm node satisfying constraints will have one kafka container.

Another recommendation is to use host mode for published port.

It's also important to properly set advertised listeners option so that each kafka broker knows which host it's running on. Use swarm service templates to provide real hostname automatically.

Also make sure that published port is different from target port.

  kafka:
    image: debezium/kafka:0.8
    volumes:
      - ./kafka:/kafka/data
    environment:
      - ZOOKEEPER_CONNECT=zookeeper:2181
      - KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_MAX_MESSAGE_BYTES=20000000
      - KAFKA_MESSAGE_MAX_BYTES=20000000
      - KAFKA_CLEANUP_POLICY=compact
      - LISTENERS=PLAINTEXT://:9092
      - BROKER_ID=-1
      - ADVERTISED_LISTENERS=PLAINTEXT://{{.Node.Hostname}}:11092
    depends_on:
      - zookeeper
    deploy:
      mode: global
    ports:
      - target: 9092
        published: 11092
        protocol: tcp
        mode: host
    networks:
      - kafka

I can't explain all the options right now, but it's the configuration that works.

Vanuan
  • 31,770
  • 10
  • 98
  • 102
1

set broker.id=-1 in server.properties to allow kafka to auto generate the broker ID. Helpful in Swarm mode.

nelson
  • 41
  • 2
  • 2
    Using an auto-generated broker ID can become problematic when you're replacing broken broker node, they won't take the same place as the node that left, thus not get the partitions the broken broker "left behind". – Frank de Jonge Sep 29 '17 at 14:05