2

I want to create a hash but I'm only interested in the keys. As a consequence, I want the values to have the smallest memory footprint possible. What would be the most suitable object to assign?

  • nil ?
  • a very short symbol like :a ?
  • Something even smaller?
joscas
  • 7,474
  • 5
  • 39
  • 59

2 Answers2

5

You can use any value you want, as long as you use the same one.

x = "A string value"
h =  Hash[ 10000.times.map{|i| [i, x]} ]
h2 = Hash[ 10000.times.map{|i| [i, nil]} ]
# h takes the same memory as h2

In the above example, x can be anything you like. The values will only hold the pointer to x, or the value itself if x is an immediate value (nil, true, false or a Fixnum).

In either case, the memory used is the same! It will be the size of a pointer on your platform (i.e. 0.size bytes). In the C code, this corresponds to a VALUE.

Just be careful to reuse the same object (i.e. same object_id) and not create a new object everytime. For example:

h3 =  Hash[ 10000.times.map{|i| [i, "A string value"]} ]
# => h3 will take a lot more space!
h.values.map(&:object_id).uniq.size  # => 1
h3.values.map(&:object_id).uniq.size # => 10000

In short, a surefire way is to use false, true, nil, a Fixnum or a Symbol, since symbols are stored in a global table. :hello.object_id is the same everywhere and the string 'hello' is stored only once and shared for all the :hello symbols in your code.

h4 =  Hash[ 10000.times.map{|i| [i, :some_symbol]} ]
# => h4 will only take as much space as h and h2
h4.values.map(&:object_id).uniq.size # => 1

FYI, the built-in library Set has the same requirement, i.e. it uses a Hash only for the keys. It uses true as the value, for simplicity's sake.

Marc-André Lafortune
  • 78,216
  • 16
  • 166
  • 166
  • +1 That is a very good explanation too, thanks. But according to @sepp2k the footprint of storing a pointer to `x` is somewhat bigger than storing `nil` or `true` and this would mean that h2 > h right? – joscas Feb 06 '13 at 18:56
  • @sepp2k's answer is misleading. A pointer takes the same memory as `nil` or `true`, at least in MRI. – Marc-André Lafortune Feb 06 '13 at 18:58
  • I was thinking to assign this as the correct answer but the question is what is the smallest footprint to assign to a variable. And in reality if you use something different from `nil`, `true`or `false`, you still have to store the pointer value or the variable value somewhere. The difference is minimal though. – joscas Feb 06 '13 at 19:07
  • @joscas: Unless that object already exists, like a Class, some constant (say `Float::INFINITY`), etc... The understanding is what's important. In any case, there is no reason to worry about the couple of bytes that `:some_symbol` takes! – Marc-André Lafortune Feb 06 '13 at 19:12
2

The following applies to the official Ruby implementation. Other implementations may differ in this regard.

nil, true, false and Fixnums are encoded inside the pointer at the C level, whereas all other objects would involve a pointer that actually points somewhere (so you'd have the space consumption of the pointer plus the space it points to). So these objects are the ones with the smallest memory footprint.

Of these, nil makes the most sense semantically.

sepp2k
  • 363,768
  • 54
  • 674
  • 675