Is there a way to get a random object from a specific bucket by using Riak's HTTP API? Let's say that you have no knowledge about the contents of a bucket, the only thing you know is that all objects in a bucket share a common data structure. What would be a good way to get any object from a bucket, in order to show its data structure? Preferably using MapReduce over Search, since Search will flatten the response.
2 Answers
The best option is to use predictable keys so you don't have to find them. Since that is not always possible, secondary indexing is the next best.
If you are using eLevelDB, you can query the $BUCKET
implicit index with max_results set to 1, which will return a single key. You would then issue a get request for that key.
If you are using Bitcask, you have 2 options:
- list all of the keys in the bucket
Key listing in Bitcask will need to fold over every value in all buckets in order to return the list of keys in a single bucket. Effectively this means reading your entire dataset from disk, so this is very heavy on the system and could bring a production cluster to its knees. - MapReduce
MapReduce over a full bucket uses a similar query to key listing so it is also very heave on the system. Since the map phase function is executed separately for each object, if your map phase returns an object, every object in the bucket would be passed over the network to the node running the reduce phase. Thus it would be more efficient (read: less disastrous) to have the map phase function return just the key with no data, then have your reduce phase return the first item in the list, which leaves you needing to issue a get request for the object once you have the key name.
While it is technically possible to find a key in a given bucket when you have no information about the keys or the contents, if you designed your system to create a key named <<"schema">> or <<"sample">> that contains a sample object in each bucket, you could simply issue a get request for that key instead of searching, folding, or mapping.

- 25,000
- 3
- 22
- 44
If you are using Riak 2.X then search (http://docs.basho.com/riak/latest/dev/using/search/) is recommended over Map Reduce or 2i queries in most use cases and it is available via the HTTP API.

- 1,001
- 6
- 11
-
How would Riak Search let me get a random object? This doesn't really answer my question. – Daniel Larsson May 08 '15 at 21:12
-
You are right, somehow I forgot to address the question of randomness. There really isn't a way to randomly select an object from Riak natively. The best bet would be to use predictable keys as described in the first answer and handle the random aspect at the application level. Sorry for my first non answer. – Craig May 10 '15 at 01:44