0

I'm trying to use Riak's mapreduce via http. his is what i'm sending:

{
"inputs":{
    "bucket":"test",
    "key_filters":[["matches", ".*"]]
},
"query":[
    {
        "map":{
            "language":"erlang",
            "source":"value(RiakObject, _KeyData, _Arg) -> Key = riak_object:key(RiakObject), Count = riak_kv_crdt:value(RiakObject, <<\"riak_kv_pncounter\">>), [ {Key, Count} ]."
        }
    }
]}

Riak fails with "[worker_startup_failed]", which isn't very informative. Could anyone please help me get this to actually execute the function?

2 Answers2

1

WARNING

Allowing arbitrary Erlang functions via map-reduce is a security risk. Any valid Erlang can be executed, including sending your entire data set offsite or formatting the hard drive.

You have been warned.


However, if you implicitly trust any client that may connect to your cluster, you can allow Erlang source to be passed in a map-reduce request by setting {allow_strfun, true} in the riak_kv section of app.config, (or in the advanced.config if you are using riak.conf).

Once you have allowed passing an Erlang function in a map-reduce phase, you need to pass in a function of the form fun(RiakObject,KeyData,Arg) -> [result] end. Note that this must be an anonymous fun, so fun is a keyword, not a name, and it must end with end.
Your function should handle the case where {error,notfound} is passed as the first argument instead of an object. Simply adding a catch-all clause to the function could accomplish that. Perhaps something like:

{
"inputs":{
    "bucket":"test",
    "key_filters":[["matches", ".*"]]
},
"query":[
    {
        "map":{
            "language":"erlang",
            "source":"fun(RiakObject, _KeyData, _Arg) -> 
                           Key = riak_object:key(RiakObject), 
                           Count = riak_kv_crdt:value(
                                            RiakObject, 
                                            <<\"riak_kv_pncounter\">>), 
                           [ {Key, Count} ];
                       (_,_,_) -> [{error,0}]
                       end."
        }
    }
]}

Allowing the source to be passed in the request is very useful while developing and debugging. For production, you really should put the functions in a dedicated pre-compiled module that you copy to the code path of each node so that the phase spec can specify the module and function by name instead of providing arbitrary code.

{"map":{
   "language":"erlang",
   "module":"yourprecompiledmodule",
   "function":"functionname"}}
Community
  • 1
  • 1
Joe
  • 25,000
  • 3
  • 22
  • 44
1

You need to enable allow_strfun on all nodes in your cluster. To do so in Riak 2, you will need to use the advanced.config file to add this to the riak_kv configuration:

[
    {riak_kv, [
        {allow_strfun, true}
    ]}
].

The other option is to create your own Erlang module by using the compiler shipped with Riak and placing the *.beam file in a well-known location for Riak to find. The basho-patches directory is one such place.

Please see the documentation as well:

Luke Bakken
  • 8,993
  • 2
  • 20
  • 33