0

I have a riak ring that has the ownership handoff stuck. The handoff seems stuck due to ehost_unreach. However the ring_status shows all nodes are up & reachable.

What can I do to fix the transfer fail problem?

Thanks!

riak-admin ring-status Attempting to restart script through sudo -H -u riak ================================== Claimant =================================== Claimant: 'riak@10.253.66.128' Status: up Ring Ready: true

============================== Ownership Handoff ============================== Owner: riak@10.253.66.181 Next Owner: riak@10.253.66.128

Index: 1415829711164312202009819681693899175291684651008 Waiting on: [riak_kv_vnode] Complete: [riak_pipe_vnode]


============================== Unreachable Nodes ============================== All nodes are up and reachable

[error] ownership_handoff transfer of riak_kv_vnode from 'riak@10.253.66.181' 1415829711164312202009819681693899175291684651008 to 'riak@10.253.66.128' 1415829711164312202009819681693899175291684651008 failed because of error:{badmatch,{error,ehostunreach}} [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,97}]}]

Fei Wan
  • 43
  • 1
  • 3

1 Answers1

0

It turns out the nodes are up and reachable status under ring-status doesn't necessarily mean the nodes are really reachable in each directions.

I didn't have the firewall set up properly on node 10.253.66.128. After enable the TCP connections on following according to http://comments.gmane.org/gmane.comp.db.riak.user/9152, I'm able to get the handoff happening properly.

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 4369 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8087 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8099 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 7010:7014 -j ACCEPT

Fei Wan
  • 43
  • 1
  • 3