TL;DR: Comparing the different solutions in this Q/A
All tested options are fine. Hash.new
with a lookup function performs best. No declaration cost and still approximately as fast if you access every element once. As long as f(...)
is cheap, as it's a lazy/on-demand data structure then. If you need your results right away, chose any other one that suits your needs and suffer at most ~16% penalty.
This answer tries to be a summery of the existing other ones to answer a follow-up question the reader might have: which is the best solution?
I will provide all the ressources to reproduce/verify my results and compare the existing answers, ordered by decreasing trend (at time of writing). Right now, even ruby2.6 is out-of support for more then a year, so we can probably assume that everything will be supported now. So let's start with this table for an overview:
There are more options such as Enumerable#reduce
+ Hash#merge!
which are not scope of this answer. You can consider this related answer discussing it's advantages, I took some inspiration/structure from there.
So these are the different solutions I have compiled and benchmarked:
Presenting the full solutions in code
a = [1,2,3,4]
def f key
key
end
g = a.to_h { |x| [x, f(x)] }
k = a.map { |x| [x, f(x)] }.to_h
o = Hash.new {|hash, key| hash[key] = f(key) }
q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }
s = a.index_with(&:c) # not tested
Benchmark script
According to noraj, you should use bmbm and not bm
to avoid differences due to the cost of memory allocation and garbage collection. This is an important point I'm discussing later in "Interpretation".
require 'benchmark'
a = (1..1_000_000)
b = a.to_a
def f key
key
end
# Just for accessing later, not important
g = a.to_h { |x| [x, f(x)] }; h = b.to_h { |x| [x, f(x)] }; i = a.to_h { |x| [x, x] }; j = b.to_h { |x| [x, x] }; k = a.map { |x| [x, f(x)] }.to_h; l = b.map { |x| [x, f(x)] }.to_h; m = a.map { |x| [x, x] }.to_h; n = b.map { |x| [x, x] }.to_h; o = Hash.new {|hash, key| hash[key] = f(key) }; q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }; r = b.each_with_object({}) { |x, memo| memo[x] = f(x) };
Benchmark.bmbm do |x|
x.report('Declaration of a.to_h { function }:') { a.to_h { |x| [x, f(x)] } }
x.report('Declaration of b.to_h { function }:') { b.to_h { |x| [x, f(x)] } }
x.report('Declaration of a.to_h { array }:') { a.to_h { |x| [x, x] } }
x.report('Declaration of b.to_h { array }:') { b.to_h { |x| [x, x] } }
x.report('Declaration of a.map { function }.to_h:') { a.map { |x| [x, f(x)] }.to_h }
x.report('Declaration of b.map { function }.to_h:') { b.map { |x| [x, f(x)] }.to_h }
x.report('Declaration of a.map { array }.to_h:') { a.map { |x| [x, x] }.to_h }
x.report('Declaration of b.map { array }.to_h:') { b.map { |x| [x, x] }.to_h }
x.report('Declaration of Hash.new (with lookup)') { Hash.new {|hash, key| hash[key] = c(key) } }
x.report('Declaration of a.each_with_object({})') { a.each_with_object({}) { |x, memo| memo[x] = f(x) } }
x.report('Declaration of a.each_with_object({})') { b.each_with_object({}) { |x, memo| memo[x] = f(x) } }
x.report('Accessing from a.to_h { function }:') { a.reduce { |sum, x| g[x] } }
x.report('Accessing from b.to_h { function }:') { a.reduce { |sum, x| h[x] } }
x.report('Accessing from a.to_h { array }:') { a.reduce { |sum, x| i[x] } }
x.report('Accessing from b.to_h { array }:') { a.reduce { |sum, x| j[x] } }
x.report('Accessing from a.map { function }.to_h:') { a.reduce { |sum, x| k[x] } }
x.report('Accessing from b.map { function }.to_h:') { a.reduce { |sum, x| l[x] } }
x.report('Accessing from a.map { array }.to_h:') { a.reduce { |sum, x| m[x] } }
x.report('Accessing from b.map { array }.to_h:') { a.reduce { |sum, x| n[x] } }
x.report('Accessing from Hash.new (with lookup)') { a.reduce { |sum, x| o[x] } }
x.report('Accessing from a.each_with_object({})') { a.reduce { |sum, x| q[x] } }
x.report('Accessing from b.each_with_object({})') { a.reduce { |sum, x| r[x] } }
end
Benchmark results for 5M items iterator/array
user system total real
Declaration of a.to_h { function }: 1.650822 0.091929 1.742751 ( 1.742868)
Declaration of b.to_h { function }: 1.170426 0.071894 1.242320 ( 1.242395)
Declaration of a.to_h { array }: 1.604802 0.051815 1.656617 ( 1.660191)
Declaration of b.to_h { array }: 1.159055 0.007980 1.167035 ( 1.167077)
Declaration of a.map { function }.to_h: 1.511910 0.099993 1.611903 ( 1.611978)
Declaration of b.map { function }.to_h: 1.381991 0.112121 1.494112 ( 1.494172)
Declaration of a.map { array }.to_h: 1.399749 0.091984 1.491733 ( 1.491789)
Declaration of b.map { array }.to_h: 1.440968 0.039829 1.480797 ( 1.480845)
Declaration of Hash.new (with lookup) 0.000017 0.000001 0.000018 ( 0.000009)
Declaration of a.each_with_object({}) 1.496914 0.131904 1.628818 ( 1.628930)
Declaration of a.each_with_object({}) 1.438418 0.180053 1.618471 ( 1.618551)
Rehearsal ---------------------------------------------------------------------------
Accessing from a.to_h { function }: 1.005993 0.000323 1.006316 ( 1.006360)
Accessing from b.to_h { function }: 0.999888 0.000164 1.000052 ( 1.000107)
Accessing from a.to_h { array }: 0.998487 0.000068 0.998555 ( 0.998610)
Accessing from b.to_h { array }: 1.003892 0.000139 1.004031 ( 1.004061)
Accessing from a.map { function }.to_h: 1.023635 0.000115 1.023750 ( 1.023789)
Accessing from b.map { function }.to_h: 1.006790 0.000061 1.006851 ( 1.006912)
Accessing from a.map { array }.to_h: 1.005898 0.000204 1.006102 ( 1.006155)
Accessing from b.map { array }.to_h: 1.013491 0.000000 1.013491 ( 1.013545)
Accessing from Hash.new (with lookup) 2.289310 0.139982 2.429292 ( 2.429415)
Accessing from a.each_with_object({}) 1.011719 0.000078 1.011797 ( 1.011855)
Accessing from b.each_with_object({}) 1.002672 0.000026 1.002698 ( 1.002755)
----------------------------------------------------------------- total: 12.502935sec
user system total real
Accessing from a.to_h { function }: 1.006216 0.000003 1.006219 ( 1.006290)
Accessing from b.to_h { function }: 1.006313 0.000003 1.006316 ( 1.006342)
Accessing from a.to_h { array }: 0.991536 0.000049 0.991585 ( 0.991623)
Accessing from b.to_h { array }: 0.999372 0.000037 0.999409 ( 0.999476)
Accessing from a.map { function }.to_h: 0.988464 0.000073 0.988537 ( 0.988588)
Accessing from b.map { function }.to_h: 1.000364 0.000066 1.000430 ( 1.000472)
Accessing from a.map { array }.to_h: 0.988162 0.000042 0.988204 ( 0.988280)
Accessing from b.map { array }.to_h: 0.997830 0.000049 0.997879 ( 0.997946)
Accessing from Hash.new (with lookup) 1.003044 0.000068 1.003112 ( 1.003149)
Accessing from a.each_with_object({}) 1.006635 0.000000 1.006635 ( 1.006668)
Accessing from b.each_with_object({}) 0.993825 0.000000 0.993825 ( 0.993867)
Interpretation
So based on these results, you can say that there is a bill for memory allocation and garbage collection, which you can see in the rehearsal phase of Hash.new
(with lookup f(...)): 2.429415 s / 1.003149 s = 242% (or 1.42 s for building the structure). This means that a preconstructed data structure isn't even guaranteed to be faster than on-demand lookup, I expected the cost to be much higher. When heavily used it could still make sense to just use Hash.new (with lookup f(...)). The consideration to make is when exactly the function is ran: on demand or previously constructed.
Imagine having a compute intensive task taking 10s
def f key
sleep 10
key
end
If you rarely need a result, just use Hash.new
with a lookup function and bam - the declaration has virtually no runtime. Or you might have a strong focus on response time - then it's obv. better to have them up-front (like a dynamic-programming style lookup table). At around 16% deviation, the performance difference isn't huge, so you could also prefer code readability. Starting with a plain array is mostly faster or equally fast so the format of your input data a
/b
made more of an impact here.