1

I use a simple secondary index query in Riak, traversing the keys in a bucket:

http://riak01:8098/buckets/my_bucket/index/$bucket/_?max_results=10

There are 10 keys in the result, as expected, such. However, when I use some of these keys in a KV query, Riak doesn't find the item. This is not caused by this particular key just being removed by another process, if I repeat (both index and KV query) in an hour, the results are the same.

What might be the reason for such behavior? Is there a way to make sure the secondary index is always consistent with the actual bucket contents, i.e. it 2i query returns the key if and only if an item with such key exists in a bucket?

Pavel S.
  • 11,892
  • 18
  • 75
  • 113
  • Did you experience node failures or network partitions recently? 2i are not 100% reliable, it may take a 2i index a long time to recover if it'd been corrupted due to an error or node failure. What do you see in the logs? Also, is AAE enabled on your setup? Have a look at http://stackoverflow.com/questions/24882953/riak-are-my-2is-broken, you may find some clues there. – vempo May 04 '16 at 08:17
  • The `$bucket` index doesn't appear to really be an index - the backend iterates the actual keys, not a separate index list. Is it possible the key contains a character that is not being properly encoded and passed via HTTP? – Joe May 04 '16 at 17:09
  • We do not have AAE enabled. There have been some node failures in the past, which might affect these keys. What's the best way to bring it back to the fold? – Pavel S. May 05 '16 at 07:17
  • Did you change the n_val in the bucket's properties? Reducing the n_val after writing data will leave keys where normal requests can't see them, but key listing and other fold-based operations can. – Joe May 05 '16 at 15:13
  • No, the n_val didn't change since the initial set-up. – Pavel S. May 05 '16 at 21:06
  • Hmm, `$bucket` is indeed a special index, but it still requires a LevelDB backend, and as such may be affected by divergence in nodes' data I believe. I don't know exactly how it's implemented, but would try enabling AAE to see if it helps. https://docs.basho.com/riak/kv/2.1.4/learn/concepts/active-anti-entropy/ – vempo May 06 '16 at 04:27
  • We specifically avoid AEE since it caused some very hard-to-predict issues with the cluster, suddenly getting busy even if there was no increase in traffic (i.e. the number of requests). Is there any other way to "sync" the 2i with the cluster content? – Pavel S. May 06 '16 at 08:42
  • I understand that AEE is disabled for a reason, but would enable it for a while just to see if it solves the problem. You may also try [2i repair](http://docs.basho.com/riak/kv/2.1.4/using/repair-recovery/repairs/#repairing-secondary-indexes), but no guarantees as it's not a real 2i. There's also [read-repair](http://docs.basho.com/riak/kv/2.1.4/learn/concepts/replication/#read-repair), but I'm not sure it's applicable in your situation. BTW, any chance you are running with `allow_mult = false` and the clocks of your cluster are out-of-sync? – vempo May 06 '16 at 14:31
  • Have you solved the problem? – vempo May 13 '16 at 06:08
  • Not yet. `2i repair` looks really good, but it needs AAE to be enabled anyway. So, I'm thinking about enabling AAE, leaving the cluster for a while, then running `2i repair`. – Pavel S. May 16 '16 at 13:37

0 Answers0