0

(this is a repost of my question from stack overflow, it was offtopic there, here is right place to ask it)

I was trying to build a Riak cluster on Raspberry Pi. Created an image with Erland and Riak, single node seems to work correctly. Then I cloned this image for my diffrent Pi's:

riak@192.168.8.59
riak@192.168.8.214
riak@192.168.8.215

They have all identical configuration, the only thing that differs are static IP's in vm.args and app.config.

Now the problem is, I'm building the cluster starting from riak@192.168.8.59, added a node riak@192.168.8.214 and it seems to be allright:

# ./riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid     100.0%     50.0%    'riak@192.168.8.214'
valid       0.0%     50.0%    'riak@192.168.8.59'
-------------------------------------------------------------------------------

Althou once I try to add third node (riak@192.168.8.215) I get the following error message:

# ./riak-admin cluster join riak@192.168.8.215
Failed: This node is already a member of a cluster

Why I didn't have this problem with riak@192.168.8.214? It seems to occur only with riak@192.168.8.215 node.

I can't force remove riak@192.168.8.215 from it's cluster because it says:

# ./riak-admin cluster force-remove riak@192.168.8.215
Failed: 'riak@192.168.8.215' is the claimant (see: riak-admin ring_status).
The claimant is the node responsible for initiating cluster changes,
and cannot forcefully remove itself. You can use 'riak-admin down' to
mark the node as offline, which will trigger a new claimant to take
over.  However, this will clear any staged changes.

Or

# ./riak-admin cluster leave                          
Failed: 'riak@192.168.8.215' is the only member.

I just can't understand it, I think I need a fresh point of view. I would also like to add I follow all the steps from documentation:

http://docs.basho.com/riak/latest/ops/building/basic-cluster-setup/

And also took in count tutorial for Raspberry Pi (but I'm not on Raspbian, I'm on ArchLinux).

http://basho.com/building-a-riak-cluster-on-raspberry-pi/

I will also add that networking works fine, I can ping and ssh from each node to each node.

I'm counting on your advices. Cheers!

EDIT:

As said before, they suggested using riak-admin down <node> command to stop this node being the claimant, this didn't work either.

# ./riak-admin down riak@192.168.8.215
Failed: riak@192.168.8.215 is up

I can't stop it because it's up... However if I try the same when Riak is not up I get:

# ./riak-admin down riak@192.168.8.215
Node is not running!

That is pretty confusing. Obviously I don't understand what is going on here, I hope someone can clarify it.

Marek
  • 141
  • 7
  • The problem is on which node you are executing the commands. The `riak-admin cluster add ` should be run *from* a node that is not a member of a cluster, and ** should be a member of the cluster you want the local node to join. `riak-admin down riak@192.168.8.215` cannot be run *from* the node `192.168.8.215`, it must be run from another member of the cluster while `riak@192.168.8.215` is down. – Joe Jun 27 '14 at 23:15

2 Answers2

1

Try these steps:

  • riak stop on all nodes
  • rm -rf /var/lib/riak/ring/* on all nodes
  • Double-check /etc/riak/vm.args to ensure -name argument is using the correct IP address
  • riak start on all nodes
  • Re-run riak-admin cluster join riak@192.168.8.59 command on the two other nodes. It's important to remember that all other nodes join the same "starter" node - riak@192.168.8.59 in this case
  • Run riak-admin cluster plan to verify on riak@192.168.8.59 node
  • Run riak-admin cluster commit on riak@192.168.8.59 node
Luke Bakken
  • 211
  • 2
  • 5
  • It worked. I have understood it wrong because of phrase `Whatever RPi node you happen to be connected to, choose the two other nodes to join. Since I’m connected to 192.168.10.12, I typed the following(...)` in this tutorial: http://basho.com/building-a-riak-cluster-on-raspberry-pi – Marek Jun 27 '14 at 07:00
0

There is no need for stop all nodes and removing ring data. Cluster operation like joining nodes can be done after marking as down the stopped claimant node. Example steps for 3-node cluster and claimant failure is as follows: https://gist.github.com/shino/dd9a75e84b2b5792a079 .

shino
  • 101
  • 1