I am trying to setup a high-availability RabbitMQ cluster of nodes in my Kubernetes cluster as a StatefulSet so that my data (e.g. queues, messages) persist even after restarting all of the nodes simultaneously. Since I'm deploying the RabbitMQ nodes in Kubernetes, I understand that I need to include an external persistent volume for the nodes to store data in so that the data will persist after a restart. I have mounted an Azure Files Share into my containers as a volume at the directory /var/lib/rabbitmq/mnesia
.
When starting with a fresh (empty) volume, the nodes start up without any issues and successfully form a cluster. I can open the RabbitMQ management UI and see that any queue I create is mirrored on all of the nodes, as expected, and the queue (plus any messages in it) will persist as long as there is at least 1 active node. Deleting pods with kubectl delete pod rabbitmq-0 -n rabbit
will cause the node to stop and then restart, and the logs show that it successfully syncs with any remaining/active node so everything is fine.
The problem I have encountered is that when I simultaneously delete all RabbitMQ nodes in the cluster, the first node to start up will have the persisted data from the volume and tries to re-cluster with the other two nodes which are, of course, not active. What I expected to happen was that the node would start up, load the queue and message data, and then form a new cluster (since it should notice that no other nodes are active).
I suspect that there may be some data in the mounted volume that indicates the presence of other nodes which is why it tries to connect with them and join the supposed cluster, but I haven't found a way to prevent that and am not certain that this is the cause.
There are two different error messages: one in the pod description (kubectl describe pod rabbitmq-0 -n rabbit
) when the RabbitMQ node is in a crash loop and another in the pod logs. The pod description error output includes the following:
exited with 137:
20:38:12.331 [error] Cookie file /var/lib/rabbitmq/.erlang.cookie must be accessible by owner only
Error: unable to perform an operation on node 'rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local']
rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local:
* connected to epmd (port 4369) on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-345-rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: xxxxxxxxxxxxxxxxx
and the logs output the following info:
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...2020-06-12 20:39:08.678 [info] <0.294.0>
node : rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : xxxxxxxxxxxxxxxxx
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
...
2020-06-12 20:48:39.015 [warning] <0.294.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit@rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local','rabbit@rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local','rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-12 20:48:39.015 [info] <0.294.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-06-12 20:49:09.341 [info] <0.44.0> Application mnesia exited with reason: stopped
2020-06-12 20:49:09.505 [error] <0.294.0>
2020-06-12 20:49:09.505 [error] <0.294.0> BOOT FAILED
2020-06-12 20:49:09.505 [error] <0.294.0> ===========
2020-06-12 20:49:09.505 [error] <0.294.0> Timeout contacting cluster nodes: ['rabbit@rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local',
2020-06-12 20:49:09.505 [error] <0.294.0> 'rabbit@rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local'].
...
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
2020-06-12 20:49:09.506 [error] <0.294.0>
2020-06-12 20:49:09.506 [error] <0.294.0> This cluster node was shut down while other nodes were still running.
2020-06-12 20:49:09.506 [error] <0.294.0> To avoid losing data, you should start the other nodes first, then
2020-06-12 20:49:09.506 [error] <0.294.0> start this one. To force this node to start, first invoke
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
What I've tried so far is clearing the /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local/nodes_running_at_shutdown
file contents, and fiddling with config settings such as the volume mount directory and erlang cookie permissions.
Below are the relevant deployment files and config files:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbit
spec:
serviceName: rabbitmq-internal
revisionHistoryLimit: 3
updateStrategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq
image: rabbitmq:0.13
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done;
rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
ports:
- containerPort: 4369
- containerPort: 5672
- containerPort: 5671
- containerPort: 25672
- containerPort: 15672
resources:
requests:
memory: "500Mi"
cpu: "0.4"
limits:
memory: "600Mi"
cpu: "0.6"
livenessProbe:
exec:
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 15
readinessProbe:
exec:
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
envFrom:
- configMapRef:
name: rabbitmq-cfg
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit@$(HOSTNAME).rabbitmq-internal.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq-internal"
- name: RABBITMQ_DEFAULT_USER
value: user
- name: RABBITMQ_DEFAULT_PASS
value: pass
- name: RABBITMQ_ERLANG_COOKIE
value: my-cookie
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: my-volume-mount
mountPath: "/var/lib/rabbitmq/mnesia"
imagePullSecrets:
- name: my-secret
volumes:
- name: my-volume-mount
azureFile:
secretName: azure-rabbitmq-secret
shareName: my-fileshare-name
readOnly: false
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-cfg
namespace: rabbit
data:
RABBITMQ_VM_MEMORY_HIGH_WATERMARK: "0.6"
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbit
name: rabbitmq-internal
labels:
app: rabbitmq
spec:
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
- name: amqp
protocol: TCP
port: 5672
- name: amqps
protocol: TCP
port: 5671
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbit
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
selector:
app: rabbitmq
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: amqps
protocol: TCP
port: 5671
targetPort: 5671
Dockerfile:
FROM rabbitmq:3.8.4
COPY conf/rabbitmq.conf /etc/rabbitmq
COPY conf/enabled_plugins /etc/rabbitmq
USER root
COPY conf/.erlang.cookie /var/lib/rabbitmq
RUN /bin/bash -c 'ls -ld /var/lib/rabbitmq/.erlang.cookie; chmod 600 /var/lib/rabbitmq/.erlang.cookie; ls -ld /var/lib/rabbitmq/.erlang.cookie'
rabbitmq.conf
## cluster formation settings
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = rabbitmq-internal
cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.rabbit.svc.cluster.local
cluster_formation.node_cleanup.interval = 60
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
queue_master_locator=min-masters
## general settings
log.file.level = debug
## Mgmt UI secure/non-secure connection settings (secure not implemented yet)
management.tcp.port = 15672
## RabbitMQ entrypoint settings (will be injected below when image is built)
Thanks in advance!