1

It's a simple question with apparently a multitude of answers.

Findings have ranged anywhere from:

a. 22 bytes as per Basho's documentation: http://docs.basho.com/riak/latest/references/appendices/Bitcask-Capacity-Planning/

b. 450~ bytes over here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-August/005178.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004292.html

c. And anecdotal records that state overheads anywhere in the range of 45 to 200 bytes.

Why isn't there a straight answer to this? I understand it's an intricate problem - one of the mailing list entries above makes it clear! - but is even coming up with a consistent ballpark so difficult? Why isn't Basho's documentation clear about this?


I have another set of problems related to how I am to structure my logic based on the key overhead (storing lots of small values versus "collecting" them in larger structures), but I guess that's another question.

Dev Kanchen
  • 2,332
  • 3
  • 28
  • 40

2 Answers2

1

The static overhead is stated on our capacity planner as 22 bytes because that's the size of the C struct. As noted on that page, the capacity planner is simply providing a rough estimate for sizing.

The old post on the mailing list by Nico you link to is probably the best complete accounting of bitcask internals you will find and is accurate. Figuring in the 8bytes for a pointer to the entry and the 13bytes of erlang overhead on the bucket/key pair you arrive at 43 bytes on a 64 bit system.

As for there not being a straight answer ... actually asking us (via email, the mailing list, IRC, carrier pigeon, etc) will always produce an actual answer.

Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • Thanks, but what about the other calculation that adds up to about 450 bytes? Or would that only happen in specific use-cases. – Dev Kanchen Mar 05 '13 at 15:12
  • I have no idea what that would be. Again, Nico's posting is an accurate accounting. – Brian Roach Mar 05 '13 at 15:18
  • Alright thanks. Overhead of 43 in RAM makes enough sense for the model I'm trying to pursue. P.S: My rant was a result of frustration from having to basically change how I was thinking about the model after reading what seemed like conflicting posts, separated by a few hours of offline work and Google searches. Do appreciate the data I've seen on the mailing list so far and the help generally available from Basho over the web. Thanks again. – Dev Kanchen Mar 05 '13 at 18:33
0

Bitcask requires all keys to be held in memory. As far as I can see the overhead referenced in a) is the one to be used when estimating the total amount of RAM bitcask will require across the cluster due to this requirement.

When writing data to disk, Riak stores the actual value together with various metadata, e.g. the vector clock. The post mentioning 450 bytes listed in b) appears to be an estimate of the storage overhead on disk and would therefore probably apply also to other backends.

Nico's post seems to contain a good and accurate explanation.

Christian Dahlqvist
  • 1,665
  • 12
  • 9