0

I have a riak_core ring with 3 physical nodes. When I joined first two nodes in the cluster( by riak_admin cluster plan; risk_admin cluster commit) the risk_admin member-status showed that cluster is in valid state but all the nodes (100%) were still sitting on the first node and both nodes showing pending 50%.

I was expecting the cluster to rebalance relatively quickly but nothing was happening until I restarted one of the nodes. When the node came up, member-status showed that 25% of the nodes being moved to the second node. Another restart resulted in complete rebalance 50% - 50% between nodes.

I removed data/ring dir on both nodes and tried to join all 3 available nodes into a new cluster - the same thing happened but this time the split of pending nodes was roughly 33%/33%/34% (as expected). The cluster rebalanced only after I bounced the nodes few times.

Is this expected behaviour? I was expecting that the act of committing the cluster plan would trigger the vnode relocation between the physical nodes?

to clarify - this is brand new riak_core app without any custom functionality for handoff implemented.

--
Note that this was also sent to the riak-user mailing list

mbb
  • 3,052
  • 1
  • 27
  • 28

1 Answers1

0

I think I figured this out - this was entirely my fault. Turned out I used the wrong name in the node_watcher.

ok = riak_core_node_watcher:service_up(<wrong_name>, self()),

this presumably caused cluster information not being propogated thru the ring correctly. As soon as I fixed this - this automatically fixed the issue with dynamic nodes allocation on the new nodes joining the cluster.

It's very interestingly that vnodes were allocated at restart.

mbb
  • 3,052
  • 1
  • 27
  • 28