0

I am running a Dev Linux machine and setting up a local Kafka for development on Kubernetes(moving from docker-compose for learning and practicing pourposes) with Kind and everything works fine but I am now trying to map volumes from Kafka and Zookeeper to the host but I am only able to for the Kafka volume. For zookeeper I configure and map the data and log paths to a volume but the internal directories are not being exposed on the host(which happens with the kafka mapping), it only shows the data and log folders but no content is actually present on the host so restarting zookeeper resets state.

I am wondering if there's a limitation or a different approach when using Kind and mapping multiples directories from different pods, what am I missing? Why only Kafka volumes are successfully persisted on host.

The full setup with a readme on how to run it's on Github under pv-pvc-setup folder.

Zookeeper meaningful configuration, Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    service: zookeeper
  name: zookeeper
spec:
  replicas: 1
  selector:
    matchLabels:
      service: zookeeper
  strategy: {}
  template:
    metadata:
      labels:
        network/kafka-network: "true"
        service: zookeeper
    spec:
      containers:
        - env:
            - name: TZ
            - name: ZOOKEEPER_CLIENT_PORT
              value: "2181"
            - name: ZOOKEEPER_DATA_DIR
              value: "/var/lib/zookeeper/data"
            - name: ZOOKEEPER_LOG_DIR
              value: "/var/lib/zookeeper/log"
            - name: ZOOKEEPER_SERVER_ID
              value: "1"
          image: confluentinc/cp-zookeeper:7.0.1
          name: zookeeper
          ports:
            - containerPort: 2181
          resources: {}
          volumeMounts:
            - mountPath: /var/lib/zookeeper
              name: zookeeper-data
      hostname: zookeeper
      restartPolicy: Always
      volumes:
        - name: zookeeper-data
          persistentVolumeClaim:
            claimName: zookeeper-pvc

Persistent volume claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zookeeper-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: zookeeper-local-storage
  resources:
    requests:
      storage: 5Gi

Persistent volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: zookeeper-pv
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: zookeeper-local-storage
  capacity:
    storage: 5Gi
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /var/lib/zookeeper

kind-config:

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
  - role: control-plane
  - role: worker
    extraPortMappings:
      - containerPort: 30092 # internal kafka nodeport
        hostPort: 9092 # port exposed on "host" machine for kafka
      - containerPort: 30081 # internal schema-registry nodeport
        hostPort: 8081 # port exposed on "host" machine for schema-registry
    extraMounts:
      - hostPath: ./tmp/kafka-data
        containerPath: /var/lib/kafka/data
        readOnly: false
        selinuxRelabel: false
        propagation: Bidirectional
      - hostPath: ./tmp/zookeeper-data
        containerPath: /var/lib/zookeeper
        readOnly: false
        selinuxRelabel: false
        propagation: Bidirectional

As I mentioned the setup works, I am now just trying to make sure relevant kafka and zookeeper volumes are mapped to persistent external storage(in this case a local disk).

groo
  • 4,213
  • 6
  • 45
  • 69
  • Start here - https://strimzi.io Or at the very least, use Confluent existing Helm Charts – OneCricketeer Jan 09 '22 at 13:15
  • I think your problem is that `./tmp` is not the same as `/tmp` and you really shouldn't be using tmp folder for data anyway – OneCricketeer Jan 09 '22 at 13:17
  • Hi, thanks I know the project. That's not what I am looking for. As I mentioned this is for learning and studying purposes, the problem is not running local Kafka per se, the setup I have here it's working. I am just trying to sort out an issue with persistent storage with multiple containers and Kind specifically. – groo Jan 09 '22 at 13:18
  • My point is that Strimzi 1) Works on kind 2) Has persistent storage figured out. In any case, `./tmp/kafka-data` will create one relative directory and should be a folder directly next to the Zookeeper data you've set – OneCricketeer Jan 09 '22 at 13:22
  • Yes, I understand that. I do have folders for the data on the host for both Zookeeper and Kafka. The problem is that for Kafka persistence the files are properly persisted in the local disk but for zookeeper, although the folders are properly mapped the data is not persisted in the host disk. I am wondering if it's a limitation of Kind or is it something I missed in the kubernetes configuration. – groo Jan 09 '22 at 13:37
  • So, if your problem is restarts, why aren't you using statefulsets? – OneCricketeer Jan 09 '22 at 13:59
  • Maybe I should, after all, I will look into that, like I am also looking in Storage Classes. Thanks. But my current issue specifically is that the same setup works for Kafka but doesn't work for Zookeeper. i.e - Kafka files are persisted to local disk but zookeeper ones don't(even sure they're in the container In the expected folder I can see them when I exec into the zookeeper container). – groo Jan 09 '22 at 14:22
  • Some things you can try - unapply the Kafka PV information, or comment it out. Or swap the order of the kind extraMounts config. Then, see if the opposite happens, Zookeeper mounts work, and Kafka does not. Then, you'll know if the problem is actually kind – OneCricketeer Jan 09 '22 at 15:13
  • 1
    Ok, it works now. I created a specific mapping and pv / pvc for each directory of zookeeper (data and log) and created a specific entry for each on kind-config. Also, I noticed a permission issue with host folders being created by kind as root which is probably due to docker running privilege user so I made sure to have the folders pre-created with same user that I was using to run kind on the host machine and all works, the full setup is here -> https://github.com/mmaia/kafka-local-kubernetes under pv-pvc-setup folder. Thanks for the tips I will work on a Storage class solution now to learn. – groo Jan 09 '22 at 16:21
  • Awesome. Feel free to move your solution to below as an answer – OneCricketeer Jan 09 '22 at 16:22

1 Answers1

2

I finally sort it out. I had 2 main issues in my initial setup, which are now fixed.

Folders used to persist data on local host need to be created beforehand so they have the same uid:guid from the one used to create the initial Kind cluster, if this is not in place the folders will not have the data persisted properly.

Created specific persistent volume and persistent volume claims for each persistent folder from the zookeeper (data and log) and configure those on kind-config. Here is the final kind-config:

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
  - role: control-plane
  - role: worker
    extraPortMappings:
      - containerPort: 30092 # internal kafka nodeport
        hostPort: 9092 # port exposed on "host" machine for kafka
      - containerPort: 30081 # internal schema-registry nodeport
        hostPort: 8081 # port exposed on "host" machine for schema-registry
    extraMounts:
      - hostPath: ./tmp/kafka-data
        containerPath: /var/lib/kafka/data
        readOnly: false
        selinuxRelabel: false
        propagation: Bidirectional
      - hostPath: ./tmp/zookeeper-data/data
        containerPath: /var/lib/zookeeper/data
        readOnly: false
        selinuxRelabel: false
        propagation: Bidirectional
      - hostPath: ./tmp/zookeeper-data/log
        containerPath: /var/lib/zookeeper/log
        readOnly: false
        selinuxRelabel: false
        propagation: Bidirectional

Full setup using persistent volumes and persistent volume claims is available in this repo with further instructions if you want to run it for fun. https://github.com/mmaia/kafka-local-kubernetes

groo
  • 4,213
  • 6
  • 45
  • 69