I have evaluated access times for a two-dimensional array, implemented as
- an array of arrays
- a hash of arrays
- a hash with arrays as keys
My expectation was to see similar acess times for all 3. I also have expected the measurements to yield similar results for MRI and JRuby.
However, for a reason that I do not understand, on MRI accessing elements within an array of arrays or within a hash of arrays is an order of magnitude faster than accessing elements of a hash.
On JRuby, instead of being 10 times as expensive, hash access was about 50 times as expensive as with an array of arrays.
The results:
MRI (2.1.1):
user system total real
hash of arrays: 1.300000 0.000000 1.300000 ( 1.302235)
array of arrays: 0.890000 0.000000 0.890000 ( 0.896942)
flat hash: 16.830000 0.000000 16.830000 ( 16.861716)
JRuby (1.7.5):
user system total real
hash of arrays: 0.280000 0.000000 0.280000 ( 0.265000)
array of arrays: 0.250000 0.000000 0.250000 ( 0.182000)
flat hash: 77.450000 0.240000 77.690000 ( 75.156000)
Here are two of my benchmarks:
ary = (0...n).map { Array.new(n, 1) }
bm.report('array of arrays:') do
iterations.times do
(0...n).each { |x|
(0...n).each { |y|
v = ary[x][y]
}
}
end
end
.
hash = {}
(0...n).each { |x|
(0...n).each { |y|
hash[[x, y]] = 1
}
}
prepared_indices = (0...n).each_with_object([]) { |x, ind|
(0...n).each { |y|
ind << [x, y]
}
}
bm.report('flat hash:') do
iterations.times do
prepared_indices.each { |i|
v = hash[i]
}
end
end
All container elements are initialized with a numeric value and have the same total number of elements. The arrays for accessing the hash are preinitialized in order to benchmark the element access only.
Here is the complete code
I have consulted this thread and this article but still have no clue about the unexpected performance differences.
Why are the results so different from my expectations? What am I missing?