0

How could it be possible that getting key from from nodes results in 404 and other node have this key (returns 200 with data).

AAE is enabled, cluster alive, no errors or handoffs, where to dig? Cluster consists of 6 nodes, all of them migrated to 2.1.4 recently, one node still at 1.4.12 (that node has the key)

Where to look and repair inconsistency? upd. values:

r,w=quorum, notfound_ok=false, but i've tried requesting it with true and r=3, same result.

i've found that on the node which have keys present some vnodes do not have AAE exchange at all

riak-admin aae-status
================================== Exchanges ==================================
Index                                              Last (ago)    All (ago)
-------------------------------------------------------------------------------
0                                                  --            --
34253944624943037145398863266787883273185918976    3.6 d         --
91343852333181432387730302044767688728495783936    4.2 d         --
171269723124715185726994316333939416365929594880   3.9 d         --
216941649291305901920859467356323260730177486848   --            --
262613575457896618114724618378707105094425378816   --            --
342539446249430371453988632667878832731859189760   4.4 d         --
388211372416021087647853783690262677096107081728   3.5 d         --
433883298582611803841718934712646521460354973696   3.7 d         --
513809169374145557180982949001818249097788784640   --            --
570899077082383952423314387779798054553098649600   --            --
627988984790622347665645826557777860008408514560   --            --
730750818665451459101842416358141509827966271488   --            --
810676689456985212441106430647313237465400082432   --            --
867766597165223607683437869425293042920709947392   --            --
913438523331814323877303020447676887284957839360   --            --
970528431040052719119634459225656692740267704320   3.7 d         --
1027618338748291114361965898003636498195577569280  3.8 d         --
1141798154164767904846628775559596109106197299200  --            --
1198888061873006300088960214337575914561507164160  --            --
1233142006497949337234359077604363797834693083136  --            --
1267395951122892374379757940871151681107879002112  3.6 d         --
1301649895747835411525156804137939564381064921088  3.6 d         --
1370157784997721485815954530671515330927436759040  8.6 hr        --
1404411729622664522961353393938303214200622678016  --            --

is it possible to force-run aae at a given node?

all node intercommunication is fine:

    Report: net_kernel summary ('riak@192.168.135.45')

Node                 State   Type         In      Out Address
riak@192.168.172.232 up      normal 13530445 13587408 192.168.172.232:6000
riak@192.168.202.11  up      normal 15055379 15009545 192.168.202.11:6000
riak@192.168.135.180 up      normal 15850450 15598452 192.168.135.180:6000
riak@192.168.205.253 up      normal 14317197 14327591 192.168.205.253:6000
riak@192.168.157.36  up      normal  6291569  5811633 192.168.157.36:6000
riak_maint_15246@192 up      hidden       11       16 192.168.135.45:53159
Total                               65045051 64334645 
  • AAE takes time to repair inconsistencies. Is the communication between the nodes OK? What are your R and W values? Are you using vector clocks or timestamps? What is the value of notfound_ok? – vempo Sep 02 '16 at 11:13
  • @vempo updated in initial post. – Andrei Mikhaltsov Sep 02 '16 at 13:22
  • I would first check the ring status (`riak-admin ring-status` and `riak-admin member-status`) to make sure the node is seen as a part of the cluster. Next, run `riak-admin diag` (http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/inspecting-node/#riak-admin-diag) to see if anything is wrong with the node. Search for 1.4 + 2.1 compatibility issues in the docs, here and on the mailing list http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com – vempo Sep 03 '16 at 06:28

1 Answers1

1

This is strange but different versions of Riak have different methods of url encoding:

If you PUT key with name test%40key at Riak 1.x node, that key will be read fine at Riak 1.x nodes in cluster and will return 404 error at 2.x nodes. But it can be found with name test%2540key at 2.x version nodes.

If you put key with name test%40key at 2.x Riak node, this key will be read find at 2.x nodes and will return 404 at 1.x node. It can be found at 1.x nodes with name test@key