There was a thread on the riak users mailing list a while back that discussed the overhead of the riak object, estimating it at ~400 bytes per object. However, that was before the new object format was introduced, so it is outdated. Here is a fresh look.
First we need a local client
(node1@127.0.0.1)1> {ok,C}=riak:local_client().
{ok,{riak_client,['node1@127.0.0.1',undefined]}}
Create a new riak object with a 0-byte value
(node1@127.0.0.1)2> Obj = riak_object:new(<<"size">>,<<"key">>,<<>>).
#r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],...}}},
value = <<>>}],
vclock = [],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}
The object is actually stored in a reduced binary format:
(node1@127.0.0.1)3> byte_size(riak_object:to_binary(v1,Obj)).
36
That is 36 bytes overhead for just the object, but that doesn't include the metadata like last updated time or the version vector, so store it in Riak and check again.
(node1@127.0.0.1)4> C:put(Obj).
ok
(node1@127.0.0.1)5> {ok,Obj1} = C:get(<<"size">>,<<"key">>).
{ok, #r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[[...]],[...],...}}},
value = <<>>}],
vclock = [{<<204,153,66,25,119,94,124,200,0,0,156,65>>,
{3,63654324108}}],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}}
(node1@127.0.0.1)6> byte_size(riak_object:to_binary(v1,Obj)).
110
Now it is 110 bytes overhead for an empty object with a single entry in the version vector. If a subsequent put of the object is coordinated by a different vnode, it will add another entry. I've selected the bucket and key names so that the local node is not a member of the preflist, so the second put has a fair probability of being coordinated by a different node.
(node1@127.0.0.1)7> C:put(Obj1).
ok
(node1@127.0.0.1)8> {ok,Obj2} = C:get(<<"size">>,<<"key">>).
{ok, #r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[[...]],[...],...}}},
value = <<>>}],
vclock = [{<<204,153,66,25,119,94,124,200,0,0,156,65>>,
{3,63654324108}},
{<<85,123,36,24,254,22,162,159,0,0,78,33>>,{1,63654324651}}],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}}
(node1@127.0.0.1)9> byte_size(riak_object:to_binary(v1,Obj2)).
141
Which is another 31 bytes added for an additional entry in the version vector.
These numbers don't include storing the actual bucket and key names with the value, or Bitcask storing them again in a hint file, so the actual space on disk would then be 2x(bucketname size + keyname size) + value overhead + file structure overhead + checksum/hash size
If you're using bitcask, there is a calculator in the documentation that will help you estimate disk and memory requirements: http://docs.basho.com/riak/kv/2.2.0/setup/planning/bitcask-capacity-calc/
If you use eLevelDB, you have the option of snappy compression which could reduce the size on disk.