90

I have an array, and a function that returns a value given a value. Ultimately I want to create a hashmap that has the values of the array as key value, and the result of f(key_value) as the value. Is there a clean, simple way, like similar to each/map of Array, of doing this using block?

So something that is equivalent to

hsh = {}
[1,2,3,4].each do |x|
  hsh[x] = f(x)
end

but looks more similar to this, in that it's simple and one line?

results = array.map { | x | f(x) }
Rouven Weßling
  • 611
  • 4
  • 16
Ji Mun
  • 1,800
  • 4
  • 18
  • 27
  • Can anybody say something about a comparison between the methods? We right now have [`.to_h` (ruby >= 2.6)](https://stackoverflow.com/a/54023057/4575793), [`.map(...).to_h` (ruby >=2.1)](https://stackoverflow.com/a/30887585/4575793) (both supporting lazy enumerators), [`Hash.new` (with lookup `f(...)`)](https://stackoverflow.com/a/13017612/4575793), [`each_with_object({})` (ruby >= 1.9.1.378) and `Hash[...]`](https://stackoverflow.com/a/13017531/4575793), [`index_with(&:...)`](https://stackoverflow.com/a/13017567/4575793)[rails >= 6](https://apidock.com/rails/v6.0.0/Enumerable/index_with). – Cadoiz Aug 08 '23 at 12:22
  • So my follow-up question is how do these 6 different methods perform? I would expect 1 to be the best, followed by 3 or 6? 4 is also only iterating one-time through, but i remember `each_with_object` to be inferior. 2 and 5 should be slower as they iterate twice. Are my assumptions true? I know that this is also about memory, not only CPU bandwidth. Or is this a whole separate question? (Just for the the sake of completeness: [Hash[...]](https://stackoverflow.com/a/13017531/4575793) is [ruby >= v1.8.6.287](https://apidock.com/ruby/v1_8_6_287/Hash) – Cadoiz Aug 08 '23 at 12:30

9 Answers9

147

Note that since Ruby 2.1.0 you can also use Array#to_h, like this:

[1,2,3,4].map{ |x| [x, f(x)] }.to_h
Soron
  • 448
  • 3
  • 10
Knotty66
  • 1,711
  • 2
  • 12
  • 4
94

Ruby 2.6.0 enables passing a block to the to_h-method. This enables an even shorter syntax for creating a hash from an array:

[1, 2, 3, 4].to_h { |x| [x, f(x)] }
Timitry
  • 2,646
  • 20
  • 26
42

You could also define the function as the hash's default value:

hash = Hash.new {|hash, key| hash[key] = f(key) }

Then when you lookup a value, the hash will calculate and store it on the fly.

hash[10]
hash.inspect #=> { 10 => whatever_the_result_is }
Zach Kemp
  • 11,736
  • 1
  • 32
  • 46
33

You need each_with_object.

def f x
  x * 2
end

t = [1, 2, 3, 4].each_with_object({}) do |x, memo|
  memo[x] = f(x)
end

t # => {1=>2, 2=>4, 3=>6, 4=>8}

Another one:

t2 = [1, 2, 3, 4].map{|x| [x, f(x)]}
Hash[t2] # => {1=>2, 2=>4, 3=>6, 4=>8}
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • Just for the completeness (suggested edits are full): [this is Ruby >= 1.9.1.378](https://apidock.com/ruby/v1_9_1_378/Enumerable/each_with_object). – Cadoiz Aug 08 '23 at 12:16
  • 1
    @Cadoiz: I am quite sure no one is using a 14yo ruby version. It's not whisky :) But thanks, good note. – Sergio Tulentsev Aug 08 '23 at 14:22
24

Check out the Hash::[] method.

Hash[ [1,2,3,4].collect { |x| [x, f(x)] } ]
Matt Huggins
  • 81,398
  • 36
  • 149
  • 218
11

From Ruby 2.1:

[1, 2, 3, 4].map { |x| [x, f(x)] }.to_h

[EDIT] Using Facets' mash (method to convert enumerable to hashes) (unmaintained?):

[1, 2, 3, 4].mash { |x| [x, f(x)] }
tokland
  • 66,169
  • 13
  • 144
  • 170
2

Also, Rails method index_with would be helpful:

a = ['a', 'bsdf', 'wqqwc']
a.index_with(&:size)
=> {"a"=>1, "bsdf"=>4, "wqqwc"=>5}
Sergio Belevskij
  • 2,478
  • 25
  • 24
  • 1
    Turns out I misread the question, and I was actually looking for `index_by`, which does the opposite (function yields the keys; elements become the values). – JakeRobb Jan 23 '23 at 22:10
  • You also do `index_with {...}` and give it a block and it is [rails >= 6.](https://apidock.com/rails/v6.0.0/Enumerable/index_with). This link also provides an example for that (unfortunately the sugessted edit queue is full). – Cadoiz Aug 08 '23 at 12:37
0

You're looking for each_with_object() method:

elem = [1,2,3,4]
h = elem.each_with_object({}) do |x, res|
  res[x] = x**2
end

puts h

The argument passed to each_with_object({}) is the initial value of an intermediate object that is passed to the block as res variable. In each iteration we're adding new pair key: value to the res Hash and returning the Hash to be used in next iteration.

The method above pre-computes a very practical hash of squared values:

{1=>1, 2=>4, 3=>9, 4=>16}
Tombart
  • 30,520
  • 16
  • 123
  • 136
  • My RuboCop tells me that `each_with_object` as in [this answer](https://stackoverflow.com/a/13017531/4575793) should be preferred. – Cadoiz Aug 08 '23 at 12:19
0

TL;DR: Comparing the different solutions in this Q/A

All tested options are fine. Hash.new with a lookup function performs best. No declaration cost and still approximately as fast if you access every element once. As long as f(...) is cheap, as it's a lazy/on-demand data structure then. If you need your results right away, chose any other one that suits your needs and suffer at most ~16% penalty.


This answer tries to be a summery of the existing other ones to answer a follow-up question the reader might have: which is the best solution? I will provide all the ressources to reproduce/verify my results and compare the existing answers, ordered by decreasing trend (at time of writing). Right now, even ruby2.6 is out-of support for more then a year, so we can probably assume that everything will be supported now. So let's start with this table for an overview:

There are more options such as Enumerable#reduce + Hash#merge! which are not scope of this answer. You can consider this related answer discussing it's advantages, I took some inspiration/structure from there.

So these are the different solutions I have compiled and benchmarked:

Presenting the full solutions in code

a = [1,2,3,4]
def f key
  key
end

g = a.to_h { |x| [x, f(x)] }
k = a.map { |x| [x, f(x)] }.to_h
o = Hash.new {|hash, key| hash[key] = f(key) }
q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }
s = a.index_with(&:c) # not tested

Benchmark script

According to noraj, you should use bmbm and not bm to avoid differences due to the cost of memory allocation and garbage collection. This is an important point I'm discussing later in "Interpretation".

require 'benchmark'

a = (1..1_000_000)
b = a.to_a
def f key
  key
end

# Just for accessing later, not important
g = a.to_h { |x| [x, f(x)] }; h = b.to_h { |x| [x, f(x)] }; i = a.to_h { |x| [x, x] }; j = b.to_h { |x| [x, x] }; k = a.map { |x| [x, f(x)] }.to_h; l = b.map { |x| [x, f(x)] }.to_h; m = a.map { |x| [x, x] }.to_h; n = b.map { |x| [x, x] }.to_h; o = Hash.new {|hash, key| hash[key] = f(key) }; q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }; r = b.each_with_object({}) { |x, memo| memo[x] = f(x) };


Benchmark.bmbm do |x|
  x.report('Declaration of a.to_h { function }:') { a.to_h { |x| [x, f(x)] } }
  x.report('Declaration of b.to_h { function }:') { b.to_h { |x| [x, f(x)] } }
  x.report('Declaration of a.to_h { array }:') { a.to_h { |x| [x, x] } }
  x.report('Declaration of b.to_h { array }:') { b.to_h { |x| [x, x] } }
  x.report('Declaration of a.map { function }.to_h:') { a.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of b.map { function }.to_h:') { b.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of a.map { array }.to_h:') { a.map { |x| [x, x] }.to_h }
  x.report('Declaration of b.map { array }.to_h:') { b.map { |x| [x, x] }.to_h }
  x.report('Declaration of Hash.new (with lookup)') { Hash.new {|hash, key| hash[key] = c(key) } }
  x.report('Declaration of a.each_with_object({})') { a.each_with_object({}) { |x, memo| memo[x] = f(x) } }
  x.report('Declaration of a.each_with_object({})') { b.each_with_object({}) { |x, memo| memo[x] = f(x) } }

  x.report('Accessing from a.to_h { function }:') { a.reduce { |sum, x| g[x] } }
  x.report('Accessing from b.to_h { function }:') { a.reduce { |sum, x| h[x] } }
  x.report('Accessing from a.to_h { array }:') { a.reduce { |sum, x| i[x] } }
  x.report('Accessing from b.to_h { array }:') { a.reduce { |sum, x| j[x] } }
  x.report('Accessing from a.map { function }.to_h:') { a.reduce { |sum, x| k[x] } }
  x.report('Accessing from b.map { function }.to_h:') { a.reduce { |sum, x| l[x] } }
  x.report('Accessing from a.map { array }.to_h:') { a.reduce { |sum, x| m[x] } }
  x.report('Accessing from b.map { array }.to_h:') { a.reduce { |sum, x| n[x] } }
  x.report('Accessing from Hash.new (with lookup)') { a.reduce { |sum, x| o[x] } }
  x.report('Accessing from a.each_with_object({})') { a.reduce { |sum, x| q[x] } }
  x.report('Accessing from b.each_with_object({})') { a.reduce { |sum, x| r[x] } }
end

Benchmark results for 5M items iterator/array

                                             user     system      total        real
Declaration of a.to_h { function }:       1.650822   0.091929   1.742751 (  1.742868)
Declaration of b.to_h { function }:       1.170426   0.071894   1.242320 (  1.242395)
Declaration of a.to_h { array }:          1.604802   0.051815   1.656617 (  1.660191)
Declaration of b.to_h { array }:          1.159055   0.007980   1.167035 (  1.167077)
Declaration of a.map { function }.to_h:   1.511910   0.099993   1.611903 (  1.611978)
Declaration of b.map { function }.to_h:   1.381991   0.112121   1.494112 (  1.494172)
Declaration of a.map { array }.to_h:      1.399749   0.091984   1.491733 (  1.491789)
Declaration of b.map { array }.to_h:      1.440968   0.039829   1.480797 (  1.480845)
Declaration of Hash.new (with lookup)     0.000017   0.000001   0.000018 (  0.000009)
Declaration of a.each_with_object({})     1.496914   0.131904   1.628818 (  1.628930)
Declaration of a.each_with_object({})     1.438418   0.180053   1.618471 (  1.618551)

Rehearsal ---------------------------------------------------------------------------
Accessing from a.to_h { function }:       1.005993   0.000323   1.006316 (  1.006360)
Accessing from b.to_h { function }:       0.999888   0.000164   1.000052 (  1.000107)
Accessing from a.to_h { array }:          0.998487   0.000068   0.998555 (  0.998610)
Accessing from b.to_h { array }:          1.003892   0.000139   1.004031 (  1.004061)
Accessing from a.map { function }.to_h:   1.023635   0.000115   1.023750 (  1.023789)
Accessing from b.map { function }.to_h:   1.006790   0.000061   1.006851 (  1.006912)
Accessing from a.map { array }.to_h:      1.005898   0.000204   1.006102 (  1.006155)
Accessing from b.map { array }.to_h:      1.013491   0.000000   1.013491 (  1.013545)
Accessing from Hash.new (with lookup)     2.289310   0.139982   2.429292 (  2.429415)
Accessing from a.each_with_object({})     1.011719   0.000078   1.011797 (  1.011855)
Accessing from b.each_with_object({})     1.002672   0.000026   1.002698 (  1.002755)
----------------------------------------------------------------- total: 12.502935sec

                                              user     system      total        real
Accessing from a.to_h { function }:       1.006216   0.000003   1.006219 (  1.006290)
Accessing from b.to_h { function }:       1.006313   0.000003   1.006316 (  1.006342)
Accessing from a.to_h { array }:          0.991536   0.000049   0.991585 (  0.991623)
Accessing from b.to_h { array }:          0.999372   0.000037   0.999409 (  0.999476)
Accessing from a.map { function }.to_h:   0.988464   0.000073   0.988537 (  0.988588)
Accessing from b.map { function }.to_h:   1.000364   0.000066   1.000430 (  1.000472)
Accessing from a.map { array }.to_h:      0.988162   0.000042   0.988204 (  0.988280)
Accessing from b.map { array }.to_h:      0.997830   0.000049   0.997879 (  0.997946)
Accessing from Hash.new (with lookup)     1.003044   0.000068   1.003112 (  1.003149)
Accessing from a.each_with_object({})     1.006635   0.000000   1.006635 (  1.006668)
Accessing from b.each_with_object({})     0.993825   0.000000   0.993825 (  0.993867)

Interpretation

So based on these results, you can say that there is a bill for memory allocation and garbage collection, which you can see in the rehearsal phase of Hash.new (with lookup f(...)): 2.429415 s / 1.003149 s = 242% (or 1.42 s for building the structure). This means that a preconstructed data structure isn't even guaranteed to be faster than on-demand lookup, I expected the cost to be much higher. When heavily used it could still make sense to just use Hash.new (with lookup f(...)). The consideration to make is when exactly the function is ran: on demand or previously constructed. Imagine having a compute intensive task taking 10s

def f key
  sleep 10
  key
end

If you rarely need a result, just use Hash.new with a lookup function and bam - the declaration has virtually no runtime. Or you might have a strong focus on response time - then it's obv. better to have them up-front (like a dynamic-programming style lookup table). At around 16% deviation, the performance difference isn't huge, so you could also prefer code readability. Starting with a plain array is mostly faster or equally fast so the format of your input data a/b made more of an impact here.

Cadoiz
  • 1,446
  • 21
  • 31