Ruby - mapping an array to hashmap

Question

I have an array, and a function that returns a value given a value. Ultimately I want to create a hashmap that has the values of the array as key value, and the result of f(key_value) as the value. Is there a clean, simple way, like similar to each/map of Array, of doing this using block?

So something that is equivalent to

hsh = {}
[1,2,3,4].each do |x|
  hsh[x] = f(x)
end

but looks more similar to this, in that it's simple and one line?

results = array.map { | x | f(x) }

Can anybody say something about a comparison between the methods? We right now have [`.to_h` (ruby >= 2.6)](https://stackoverflow.com/a/54023057/4575793), [`.map(...).to_h` (ruby >=2.1)](https://stackoverflow.com/a/30887585/4575793) (both supporting lazy enumerators), [`Hash.new` (with lookup `f(...)`)](https://stackoverflow.com/a/13017612/4575793), [`each_with_object({})` (ruby >= 1.9.1.378) and `Hash[...]`](https://stackoverflow.com/a/13017531/4575793), [`index_with(&:...)`](https://stackoverflow.com/a/13017567/4575793)[rails >= 6](https://apidock.com/rails/v6.0.0/Enumerable/index_with). — Cadoiz, Aug 08 '23 at 12:22
So my follow-up question is how do these 6 different methods perform? I would expect 1 to be the best, followed by 3 or 6? 4 is also only iterating one-time through, but i remember `each_with_object` to be inferior. 2 and 5 should be slower as they iterate twice. Are my assumptions true? I know that this is also about memory, not only CPU bandwidth. Or is this a whole separate question? (Just for the the sake of completeness: [Hash[...]](https://stackoverflow.com/a/13017531/4575793) is [ruby >= v1.8.6.287](https://apidock.com/ruby/v1_8_6_287/Hash) — Cadoiz, Aug 08 '23 at 12:30

score 147 · Answer 1 · edited Aug 20 '16 at 11:23

147

Note that since Ruby 2.1.0 you can also use Array#to_h, like this:

[1,2,3,4].map{ |x| [x, f(x)] }.to_h

edited Aug 20 '16 at 11:23

Soron

448
3
10

answered Jun 17 '15 at 09:29

Knotty66

1,711
2
12
4

score 94 · Answer 2 · answered Jan 03 '19 at 13:15

94

Ruby 2.6.0 enables passing a block to the to_h-method. This enables an even shorter syntax for creating a hash from an array:

[1, 2, 3, 4].to_h { |x| [x, f(x)] }

answered Jan 03 '19 at 13:15

Timitry

2,646
20
26

short and sweet – devdanke Aug 29 '22 at 12:31

score 42 · Accepted Answer · answered Oct 22 '12 at 18:38

42

You could also define the function as the hash's default value:

hash = Hash.new {|hash, key| hash[key] = f(key) }

Then when you lookup a value, the hash will calculate and store it on the fly.

hash[10]
hash.inspect #=> { 10 => whatever_the_result_is }

answered Oct 22 '12 at 18:38

Zach Kemp

11,736
1
32
46

score 33 · Answer 4 · answered Oct 22 '12 at 18:32

33

You need each_with_object.

def f x
  x * 2
end

t = [1, 2, 3, 4].each_with_object({}) do |x, memo|
  memo[x] = f(x)
end

t # => {1=>2, 2=>4, 3=>6, 4=>8}

Another one:

t2 = [1, 2, 3, 4].map{|x| [x, f(x)]}
Hash[t2] # => {1=>2, 2=>4, 3=>6, 4=>8}

answered Oct 22 '12 at 18:32

Sergio Tulentsev

226,338
43
373
367

Just for the completeness (suggested edits are full): [this is Ruby >= 1.9.1.378](https://apidock.com/ruby/v1_9_1_378/Enumerable/each_with_object). – Cadoiz Aug 08 '23 at 12:16
1

@Cadoiz: I am quite sure no one is using a 14yo ruby version. It's not whisky :) But thanks, good note. – Sergio Tulentsev Aug 08 '23 at 14:22

score 24 · Answer 5 · answered Oct 22 '12 at 18:35

24

Check out the Hash::[] method.

Hash[ [1,2,3,4].collect { |x| [x, f(x)] } ]

answered Oct 22 '12 at 18:35

Matt Huggins

81,398
36
149
218

Note that this doesn't work with lazy enumerators, but Ruby 2.1.0's `.to_h` does. – Oded Niv Jan 28 '18 at 14:14

tokland · Answer 6 · 2023-08-08T12:27:21.507

11

From Ruby 2.1:

[1, 2, 3, 4].map { |x| [x, f(x)] }.to_h

[EDIT] Using Facets' mash (method to convert enumerable to hashes) (unmaintained?):

[1, 2, 3, 4].mash { |x| [x, f(x)] }

edited Aug 08 '23 at 12:27

answered Oct 22 '12 at 19:12

tokland

66,169
13
144
170

1

@Cadoiz thx, updated. Not sure facets is being developed anymore. – tokland Aug 08 '23 at 12:27

Sergio Belevskij · Answer 7 · 2023-01-24T07:17:28.747

2

Also, Rails method index_with would be helpful:

a = ['a', 'bsdf', 'wqqwc']
a.index_with(&:size)
=> {"a"=>1, "bsdf"=>4, "wqqwc"=>5}

edited Jan 24 '23 at 07:17

answered Sep 20 '22 at 12:18

Sergio Belevskij

2,478
25
24

1

Turns out I misread the question, and I was actually looking for `index_by`, which does the opposite (function yields the keys; elements become the values). – JakeRobb Jan 23 '23 at 22:10
You also do `index_with {...}` and give it a block and it is [rails >= 6.](https://apidock.com/rails/v6.0.0/Enumerable/index_with). This link also provides an example for that (unfortunately the sugessted edit queue is full). – Cadoiz Aug 08 '23 at 12:37

Tombart · Answer 8 · 2023-08-14T12:46:56.070

0

You're looking for each_with_object() method:

elem = [1,2,3,4]
h = elem.each_with_object({}) do |x, res|
  res[x] = x**2
end

puts h

The argument passed to each_with_object({}) is the initial value of an intermediate object that is passed to the block as res variable. In each iteration we're adding new pair key: value to the res Hash and returning the Hash to be used in next iteration.

The method above pre-computes a very practical hash of squared values:

{1=>1, 2=>4, 3=>9, 4=>16}

edited Aug 14 '23 at 12:46

answered Nov 14 '18 at 10:59

Tombart

30,520
16
123
136

My RuboCop tells me that `each_with_object` as in [this answer](https://stackoverflow.com/a/13017531/4575793) should be preferred. – Cadoiz Aug 08 '23 at 12:19

Cadoiz · Answer 9 · 2023-08-14T06:49:15.843

TL;DR: Comparing the different solutions in this Q/A

All tested options are fine. Hash.new with a lookup function performs best. No declaration cost and still approximately as fast if you access every element once. As long as f(...) is cheap, as it's a lazy/on-demand data structure then. If you need your results right away, chose any other one that suits your needs and suffer at most ~16% penalty.

This answer tries to be a summery of the existing other ones to answer a follow-up question the reader might have: which is the best solution? I will provide all the ressources to reproduce/verify my results and compare the existing answers, ordered by decreasing trend (at time of writing). Right now, even ruby2.6 is out-of support for more then a year, so we can probably assume that everything will be supported now. So let's start with this table for an overview:

Method	Version (Docu link)	Lazy Enumerator Support
`.to_h`	ruby >= 2.6	Yes
`.map(...).to_h`	ruby >= 2.1	Yes
`Hash.new` (with lookup f(...))	ruby >= 1.8.6.287	Different approach, is somewhat lazy
each_with_object({})	ruby >= 1.9.1.378	No
Hash[...], index_with(&:...)	rails >= 6	No

There are more options such as Enumerable#reduce + Hash#merge! which are not scope of this answer. You can consider this related answer discussing it's advantages, I took some inspiration/structure from there.

So these are the different solutions I have compiled and benchmarked:

Presenting the full solutions in code

a = [1,2,3,4]
def f key
  key
end

g = a.to_h { |x| [x, f(x)] }
k = a.map { |x| [x, f(x)] }.to_h
o = Hash.new {|hash, key| hash[key] = f(key) }
q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }
s = a.index_with(&:c) # not tested

Benchmark script

According to noraj, you should use bmbm and not bm to avoid differences due to the cost of memory allocation and garbage collection. This is an important point I'm discussing later in "Interpretation".

require 'benchmark'

a = (1..1_000_000)
b = a.to_a
def f key
  key
end

# Just for accessing later, not important
g = a.to_h { |x| [x, f(x)] }; h = b.to_h { |x| [x, f(x)] }; i = a.to_h { |x| [x, x] }; j = b.to_h { |x| [x, x] }; k = a.map { |x| [x, f(x)] }.to_h; l = b.map { |x| [x, f(x)] }.to_h; m = a.map { |x| [x, x] }.to_h; n = b.map { |x| [x, x] }.to_h; o = Hash.new {|hash, key| hash[key] = f(key) }; q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }; r = b.each_with_object({}) { |x, memo| memo[x] = f(x) };


Benchmark.bmbm do |x|
  x.report('Declaration of a.to_h { function }:') { a.to_h { |x| [x, f(x)] } }
  x.report('Declaration of b.to_h { function }:') { b.to_h { |x| [x, f(x)] } }
  x.report('Declaration of a.to_h { array }:') { a.to_h { |x| [x, x] } }
  x.report('Declaration of b.to_h { array }:') { b.to_h { |x| [x, x] } }
  x.report('Declaration of a.map { function }.to_h:') { a.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of b.map { function }.to_h:') { b.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of a.map { array }.to_h:') { a.map { |x| [x, x] }.to_h }
  x.report('Declaration of b.map { array }.to_h:') { b.map { |x| [x, x] }.to_h }
  x.report('Declaration of Hash.new (with lookup)') { Hash.new {|hash, key| hash[key] = c(key) } }
  x.report('Declaration of a.each_with_object({})') { a.each_with_object({}) { |x, memo| memo[x] = f(x) } }
  x.report('Declaration of a.each_with_object({})') { b.each_with_object({}) { |x, memo| memo[x] = f(x) } }

  x.report('Accessing from a.to_h { function }:') { a.reduce { |sum, x| g[x] } }
  x.report('Accessing from b.to_h { function }:') { a.reduce { |sum, x| h[x] } }
  x.report('Accessing from a.to_h { array }:') { a.reduce { |sum, x| i[x] } }
  x.report('Accessing from b.to_h { array }:') { a.reduce { |sum, x| j[x] } }
  x.report('Accessing from a.map { function }.to_h:') { a.reduce { |sum, x| k[x] } }
  x.report('Accessing from b.map { function }.to_h:') { a.reduce { |sum, x| l[x] } }
  x.report('Accessing from a.map { array }.to_h:') { a.reduce { |sum, x| m[x] } }
  x.report('Accessing from b.map { array }.to_h:') { a.reduce { |sum, x| n[x] } }
  x.report('Accessing from Hash.new (with lookup)') { a.reduce { |sum, x| o[x] } }
  x.report('Accessing from a.each_with_object({})') { a.reduce { |sum, x| q[x] } }
  x.report('Accessing from b.each_with_object({})') { a.reduce { |sum, x| r[x] } }
end

Benchmark results for 5M items iterator/array

                                             user     system      total        real
Declaration of a.to_h { function }:       1.650822   0.091929   1.742751 (  1.742868)
Declaration of b.to_h { function }:       1.170426   0.071894   1.242320 (  1.242395)
Declaration of a.to_h { array }:          1.604802   0.051815   1.656617 (  1.660191)
Declaration of b.to_h { array }:          1.159055   0.007980   1.167035 (  1.167077)
Declaration of a.map { function }.to_h:   1.511910   0.099993   1.611903 (  1.611978)
Declaration of b.map { function }.to_h:   1.381991   0.112121   1.494112 (  1.494172)
Declaration of a.map { array }.to_h:      1.399749   0.091984   1.491733 (  1.491789)
Declaration of b.map { array }.to_h:      1.440968   0.039829   1.480797 (  1.480845)
Declaration of Hash.new (with lookup)     0.000017   0.000001   0.000018 (  0.000009)
Declaration of a.each_with_object({})     1.496914   0.131904   1.628818 (  1.628930)
Declaration of a.each_with_object({})     1.438418   0.180053   1.618471 (  1.618551)

Rehearsal ---------------------------------------------------------------------------
Accessing from a.to_h { function }:       1.005993   0.000323   1.006316 (  1.006360)
Accessing from b.to_h { function }:       0.999888   0.000164   1.000052 (  1.000107)
Accessing from a.to_h { array }:          0.998487   0.000068   0.998555 (  0.998610)
Accessing from b.to_h { array }:          1.003892   0.000139   1.004031 (  1.004061)
Accessing from a.map { function }.to_h:   1.023635   0.000115   1.023750 (  1.023789)
Accessing from b.map { function }.to_h:   1.006790   0.000061   1.006851 (  1.006912)
Accessing from a.map { array }.to_h:      1.005898   0.000204   1.006102 (  1.006155)
Accessing from b.map { array }.to_h:      1.013491   0.000000   1.013491 (  1.013545)
Accessing from Hash.new (with lookup)     2.289310   0.139982   2.429292 (  2.429415)
Accessing from a.each_with_object({})     1.011719   0.000078   1.011797 (  1.011855)
Accessing from b.each_with_object({})     1.002672   0.000026   1.002698 (  1.002755)
----------------------------------------------------------------- total: 12.502935sec

                                              user     system      total        real
Accessing from a.to_h { function }:       1.006216   0.000003   1.006219 (  1.006290)
Accessing from b.to_h { function }:       1.006313   0.000003   1.006316 (  1.006342)
Accessing from a.to_h { array }:          0.991536   0.000049   0.991585 (  0.991623)
Accessing from b.to_h { array }:          0.999372   0.000037   0.999409 (  0.999476)
Accessing from a.map { function }.to_h:   0.988464   0.000073   0.988537 (  0.988588)
Accessing from b.map { function }.to_h:   1.000364   0.000066   1.000430 (  1.000472)
Accessing from a.map { array }.to_h:      0.988162   0.000042   0.988204 (  0.988280)
Accessing from b.map { array }.to_h:      0.997830   0.000049   0.997879 (  0.997946)
Accessing from Hash.new (with lookup)     1.003044   0.000068   1.003112 (  1.003149)
Accessing from a.each_with_object({})     1.006635   0.000000   1.006635 (  1.006668)
Accessing from b.each_with_object({})     0.993825   0.000000   0.993825 (  0.993867)

Interpretation

So based on these results, you can say that there is a bill for memory allocation and garbage collection, which you can see in the rehearsal phase of Hash.new (with lookup f(...)): 2.429415 s / 1.003149 s = 242% (or 1.42 s for building the structure). This means that a preconstructed data structure isn't even guaranteed to be faster than on-demand lookup, I expected the cost to be much higher. When heavily used it could still make sense to just use Hash.new (with lookup f(...)). The consideration to make is when exactly the function is ran: on demand or previously constructed. Imagine having a compute intensive task taking 10s

def f key
  sleep 10
  key
end

If you rarely need a result, just use Hash.new with a lookup function and bam - the declaration has virtually no runtime. Or you might have a strong focus on response time - then it's obv. better to have them up-front (like a dynamic-programming style lookup table). At around 16% deviation, the performance difference isn't huge, so you could also prefer code readability. Starting with a plain array is mostly faster or equally fast so the format of your input data a/b made more of an impact here.