Riak-CS cluster broken after only 1/3 node failed! The AWS Access Key Id you provided does not exist in our records

Question

I've created 3-node Riak-CS cluster in sandbox, created buckets, uploaded some files, and they were replicated between nodes (I hope intelligent algorithm puts files mainly in partitions on physically different nodes). v_node=2, other replica config is by default.

Now I try situation when one of three nodes fails. I just stopped riak and riak-cs services on one node and getting this from the rest nodes:

s3cmd la s3://
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

It's supposed that cluster remain operational in case of one node fail, isn't it? Also I tried to mark failed node as Down to be sure cluster state became converged, but this doesn't help.

The 403 could mask some errors that occur during authentication, do the console.log or crash.log files for either Riak, Riak CS or Stanchion contain anything relevant at the time you got the 403? — Joe, Oct 21 '15 at 19:07
What does `v_node=2` mean? Is it `n_val` in default bucket properties? — shino, Oct 22 '15 at 01:03
In logs I see `Fetching user record with strong option failed: <<"{pr_val_unsatisfied,2,1} Retrieval of user record for s3 failed. Reason: <<"{r_val_unsatisfied,2,1}" Error occurred trying to query from time 0 to <<"1445435811">>in gc key index. Reason: <<"{error,insufficient_vnodes_available}"` — Egor Sushkov, Oct 22 '15 at 14:00
The question is why actual r and pr are 1, while 2 nodes alive... — Egor Sushkov, Oct 22 '15 at 14:20

Joe · Accepted Answer · 2015-10-23T06:54:24.143

If you have set your n_val to 2, then there are only 2 replicas of each key. When you shut down one node, one of the replicas for a significant fraction (around 50%) of your keys becomes unavailable.

Looking at the source for the get_user_with_pbc function, it first tries with the strong_get_user_with_pbc function The strong option for fetching a user record is {pr,all}, {r,all}, {notfound_ok,false}. PR=all means the get request will fail early unless both primary vnodes are available. If one of your replicas is unavailable, that fails as expected with the the pr_val_unsatisfied.

If the strong option fails, it retries with the weak_get_user_with_pbc function using weak options {r, quorum}, {pr, one}, {notfound_ok,false}. Quorum means (n_val/2 + 1), in this case 2.
So this still requires one of the primary vnodes to be available, but we must also get a response from a quorum of vnodes, in this case, both the primary and the fallback. If the node has just failed, the first request will find that the fallback is empty, so the get request receives a notfound from the fallback vnode, and the user record from the primary. Since the options include notfound_ok=false, that is 1 valid response while quorum is 2, so the request fails.

Subsequent queries may complete successfully since the fallback would be populated by read-repair after the first request.

I think you will find a great many things in Riak and Riak CS that don't seem to work quite right if you reduce n_val below 3. For instance, if you had kept n_val at 3, since a quorum of 3 vnodes is 2, you could still have gotten a valid response to the weak options if one of the primaries was offline and the fallback had not yet been populated.

I've checked sources and it is interesting, that default values are actually not as in [Replication Properties](http://docs.basho.com/riak/latest/dev/advanced/replication-properties/#Available-Parameters) documentation. `{pr,all}, {r,all}, {notfound_ok,false}` with no explicitly defined pr and r values in config give us `pr=n_val, r=n_val and notfound_ok=false` while in documentation we can see defaults `pr=0, r=n_val/2+1 and notfound_ok=true`! — Egor Sushkov, Oct 23 '15 at 09:55
Thanks! I've completely got the idea. Now I get Riak-CS errors about malformed xml trying create bucket in degraded cluster, but this is another question. — Egor Sushkov, Oct 23 '15 at 10:14
Also I've answered my question from previous comment - stanchion strictly requires StrongOptions when checks if bucket exists (and don't check WeakOptions to not catch collision) and this may fail if node who responsible to that bucket now down, am I right? — Egor Sushkov, Oct 23 '15 at 10:30
Correct, user creation, bucket creation, and bucket deletion (possibly a few others) are all serialized through stanchion to ensure that there cannot be siblings. If one member of the preflist is unavailable those operations will fail. So in a 3 node cluster with n_val=2, you will see around half of those types of operations fail, depending on where they hash in the ring. — Joe, Oct 23 '15 at 16:14

Riak-CS cluster broken after only 1/3 node failed! The AWS Access Key Id you provided does not exist in our records

1 Answers1