1

I am trying to upgrade a riak_core erlang application while it is running.

Simple upgrades are working. I use rebar3 and relflow to upgrade the application succesfully. However, if I change the internals of a vnode and use relflow and rebar3 relup to generate a new release, the vnode stops working. The vnode is called 'cavv'.

After the hot upgrade, it crashes at this point:

DocIdx = riak_core_util:chash_key({<<"run">>, term_to_binary(os:timestamp())}), 

which results in this error:

** exception error: bad argument
     in function  lists:keyfind/3
        called as lists:keyfind(chash_keyfun,1,[{name,<<"run">>}|undefined])
     in call from riak_core_util:chash_key/2 (_build/default/lib/riak_core/src/riak_core_util.erl, line 266)
     in call from cavv_vnode:run/0 (_build/prod/lib/cavv/src/cavv_vnode.erl, line 38)

My relup looks like this:

{"0.1.2",
 [{"0.1.1",[],
   [{load_object_code,{cavv,"20161203-211601-relflow",[cavv_vnode]}},
    point_of_no_return,
    {load,{cavv_vnode,brutal_purge,brutal_purge}}]}],
 [{"0.1.1",[],[point_of_no_return]}]}. 

Am I missing something? Do I have to restart some master vnode? I tried restarting some supervisors, without success.

Looking at the source code of riak_core:

%% @spec chash_key(BKey :: riak_object:bkey()) -> chash:index()
%% @doc Create a binary used for determining replica placement.
chash_key({Bucket,_Key}=BKey) ->
    BucketProps = riak_core_bucket:get_bucket(Bucket),
    chash_key(BKey, BucketProps).

%% @spec chash_key(BKey :: riak_object:bkey(), [{atom(), any()}]) ->
%%          chash:index()
%% @doc Create a binary used for determining replica placement.
chash_key({Bucket,Key}, BucketProps) ->
    {_, {M, F}} = lists:keyfind(chash_keyfun, 1, BucketProps), %% <-- Line 266
    M:F({Bucket,Key}).

I tried to understand what is going on, but had a hard time grasping what is happening. Somehow something in BucketProps is undefined what should not be undefined after the upgrade?

When I restart the whole application, it works like a charm.

Am I missing something during my hot upgrade with riak_core? Or is it better to just shut down the whole node, then upgrade and start it up again and forget about hot code upgrading?

UPDATE In the mean time I have found out that something goes wrong with the riak_core_bucket.

Running the following: riak_core_bucket:get_bucket(<<"run">>).

Before the upgrade:

[{name,<<"run">>},
 {allow_mult,false},
 {basic_quorum,false},
 {big_vclock,50},
 {chash_keyfun,{riak_core_util,chash_std_keyfun}},
 {dvv_enabled,false},
 {dw,quorum},
 {last_write_wins,false},
 {linkfun,{modfun,riak_kv_wm_link_walker,mapreduce_linkfun}},
 {n_val,3},
 {notfound_ok,true},
 {old_vclock,86400},
 {postcommit,[]},
 {pr,0},
 {precommit,[]},
 {pw,0},
 {r,quorum},
 {rw,quorum},
 {small_vclock,50},
 {w,quorum},
 {young_vclock,20}]

After the upgrade:

[{name,<<"run">>}|undefined]

Undefined is returned by app_helper:get_env(riak_core, default_bucket_props). after the upgrade.

I have found out it tries to process sys.config during the upgrade:

Warning: "_build/prod/rel/cavv/releases/0.1.2/sys.config" missing (optional)

Using the generated app.conf is not enough, as it not contains all config values previously shown. Using it only outputs: [{n_val,3}].

Maybe something with Cuttlefish not properly reloading conf files?

UPDATE2

Done some more digging. After the upgrade application:get_all_env(riak_core). returns different values. Any ideas?

jvdveuten
  • 621
  • 4
  • 23

1 Answers1

1

I found out that after the upgrade, all environment values are purged:

http://erlang.org/doc/design_principles/release_handling.html#id84983

Specifically, the application configuration parameters are automatically updated according to (in increasing priority order):

The data in the boot script, fetched from the new application resource file App.app The new sys.config Command-line arguments -App Par Val This means that parameter values set in the other system configuration files and values set using application:set_env/3 are disregarded.

To reset the values set by riak_core, I use a simple function:

set_defaults() ->
     riak_core_bucket:append_bucket_defaults(riak_core_bucket_type:defaults(default_type)).

I call this in my relup file with:

{apply,{cavv_app,set_defaults,[]}},

The defaults are set normally during the start of the riak_core app here:

https://github.com/basho/riak_core/blob/develop/src/riak_core_app.erl#L42

But this function is not exported and not exposed.

Don't know if this is the most elegant solution I have come up with. Any ideas are welcome.

jvdveuten
  • 621
  • 4
  • 23