RabbitMQ queue-master-locator does not work when stoping one node

Question

we are exploring the queue-master-locator 'min-masters' policy, and it looks like it works OK when we create new Queues, but when for some reason we need to stop one node of the cluster, every queues existing on that node get promoted (master) to the same node, for example:

Node A has 30 queues Node B has 0 queues Node C has 2 queues Node D has 3 queues

When we stop Node A, all 30 queues get promoted to Node B, it is that the expected result?. We was hoping that the 30 queues would be distribuited across Node B, C, and D...

Please we are starting to get crazy arround this. Has everyone experimented this scenario? and It is posible to achieve what we are expecting to happen when shutdown Node A in some automatic way?

Our policy is defined as follow:

Listing policies ... prod ha queues ^ {"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic","queue-master-locator":"min-masters"}

We have 4 nodes in the cluster, 2 RAM nodes and 2 Disk Nodes.

The policy works just fine when we create new queues but it does nothing when we stop one node.

Thanks

score 1 · Answer 1 · answered Oct 24 '17 at 19:13

1

According to the official documentation:

If the master fails […] the longest running mirror is promoted to master

And the reason just follows:

the assumption being that it is most likely to be fully synchronised with the master.

https://www.rabbitmq.com/ha.html

answered Oct 24 '17 at 19:13

user8808265

1,893
1
17
25

Ok, I understand that. Regarding my other question, is there a way automatic or manual to rebalance the queues that are promoted to the oldest slave node ? – Pablo Pinargote Oct 25 '17 at 14:17
No automatic way, unfortunately. Some manual ways that have some (major) drawbacks. You can force a master via policy (see https://groups.google.com/forum/#!msg/rabbitmq-users/bJNcrDVhWiU/6oMO0DjNQ4oJ) but this causes all mirrors to loose sync, which is quite risky. If it's too risky, you may force a sync just after setting the policy, but this is a blocking operation, probably not production-proof. If one node can't handle the load, your only chance is to implement some solution that relies on federated queues. – user8808265 Oct 25 '17 at 15:50

Luke Bakken · Answer 2 · 2017-10-26T14:54:18.547

The RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions here.

I just tried out your specific scenario using the latest pre-release code for 3.6.13 and I can reproduce it. The issue is that RabbitMQ does not move existing queue masters to rebalance queues - it will only move queues whose master existed on the node that went down. I have opened this issue to address what you report here.

You can use this unsupported script to rebalance queue masters in your cluster. If you have issues running the script, please post to rabbitmq-users and I will see it.

RabbitMQ queue-master-locator does not work when stoping one node

2 Answers2