What is the most efficient way to check if two hashes h1
and h2
have the same set of keys disregarding the order? Could it be made faster or more concise with close efficiency than the answer that I post?

- 165,429
- 45
- 277
- 381
-
Did you compare that with `h1.keys.sort == h2.keys.sort`? – Sergio Tulentsev Dec 09 '12 at 10:21
-
I did with a limited example. `h1.keys.sort == h2.keys.sort` was a bit slower. But I am not sure if this is the case in general. – sawa Dec 09 '12 at 10:22
-
2I think you should mention that in the question. And also I would post the solution as part of the question, not as an answer. – Sergio Tulentsev Dec 09 '12 at 10:25
-
I didn't think my answer is special than any others. I thought it would be better to have all possible answers including mine listed together rather than having one random one (mine) within the question. – sawa Dec 09 '12 at 10:26
-
3I think that it's pure convenience. You write "could it be easier than my answer"? Now I have to scroll down, parse answers and find yours. It's extra work for me for no reason. – Sergio Tulentsev Dec 09 '12 at 10:29
-
I am asking for the fastest solution. In any case, you would be comparing with other answers posted at that point. – sawa Dec 09 '12 at 10:30
-
1A little off-topic question: is it for pure fun or do you have VERY large hashes (and you have profiled your code) and improving this part of code will give you HUGE performance boost? – Tomek Wałkuski Dec 09 '12 at 10:52
-
I am using this iteratively many times for hashes that are not so big. – sawa Dec 09 '12 at 11:08
-
@TomaszWałkuski this is not offtopic, but the most ontopic question I can think of! Any solution will depend on the use case. – akuhn Dec 09 '12 at 12:15
7 Answers
Alright, let's break all rules of savoir vivre and portability. MRI's C API comes into play.
/* Name this file superhash.c. An appropriate Makefile is attached below. */
#include <ruby/ruby.h>
static int key_is_in_other(VALUE key, VALUE val, VALUE data) {
struct st_table *other = ((struct st_table**) data)[0];
if (st_lookup(other, key, 0)) {
return ST_CONTINUE;
} else {
int *failed = ((int**) data)[1];
*failed = 1;
return ST_STOP;
}
}
static VALUE hash_size(VALUE hash) {
if (!RHASH(hash)->ntbl)
return INT2FIX(0);
return INT2FIX(RHASH(hash)->ntbl->num_entries);
}
static VALUE same_keys(VALUE self, VALUE other) {
if (CLASS_OF(other) != rb_cHash)
rb_raise(rb_eArgError, "argument needs to be a hash");
if (hash_size(self) != hash_size(other))
return Qfalse;
if (!RHASH(other)->ntbl && !RHASH(other)->ntbl)
return Qtrue;
int failed = 0;
void *data[2] = { RHASH(other)->ntbl, &failed };
rb_hash_foreach(self, key_is_in_other, (VALUE) data);
return failed ? Qfalse : Qtrue;
}
void Init_superhash(void) {
rb_define_method(rb_cHash, "same_keys?", same_keys, 1);
}
Here's a Makefile.
CFLAGS=-std=c99 -O2 -Wall -fPIC $(shell pkg-config ruby-1.9 --cflags)
LDFLAGS=-Wl,-O1,--as-needed $(shell pkg-config ruby-1.9 --libs)
superhash.so: superhash.o
$(LINK.c) -shared $^ -o $@
An artificial, synthetic and simplistic benchmark shows what follows.
require 'superhash'
require 'benchmark'
n = 100_000
h1 = h2 = {a:5, b:8, c:1, d:9}
Benchmark.bm do |b|
# freemasonjson's state of the art.
b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}}
# This solution
b.report { n.times { h1.same_keys? h2} }
end
# user system total real
# 0.310000 0.000000 0.310000 ( 0.312249)
# 0.050000 0.000000 0.050000 ( 0.051807)

- 11,636
- 38
- 47
Try:
# Check that both hash have the same number of entries first before anything
if h1.size == h2.size
# breaks from iteration and returns 'false' as soon as there is a mismatched key
# otherwise returns true
h1.keys.all?{ |key| !!h2[key] }
end
worse case scenario, you'd only be iterating through the keys once.

- 5,674
- 4
- 24
- 29
-
2
-
1I did some benchmarks and it seems that this answer is a clear winner so far. Using `Hash#include?` doesn't bring any improvements to performance but it's surely a good step forward in terms of readability. – Jan Dec 09 '12 at 11:51
-
1
-
@Jan caution with benchmarks. In particular synthetic ones! This solution (whether using include or not) will be faster if and only if the key sets differ more often than not. If the dominating case is equals keys sets, it will be slower. – akuhn Dec 09 '12 at 12:13
-
@akuhn, thanks for the comment. I agree with you regarding synthetic benchmarks. However, in the benchmark I did `h1 == h2` and as a result key sets _were equal_. – Jan Dec 09 '12 at 12:24
-
@Jan interesting, so even for equal keysets this one is faster than all other solutions presented here? – akuhn Dec 09 '12 at 12:32
-
1@akuhn, that's what my benchmark showed. It came as a surprise but when I gave it a thought it makes sense. Unlike other answers this solution doesn't create many additional objects in the memory. As a result it's GC-friendly, which in the light of MRI's GC's performance is a huge benefit. – Jan Dec 09 '12 at 12:34
-
@Jan interesting. Also, it might be that Ruby's implementation of `Array#-` isn't the smartest either. Did you try `Set.new(h1.keys) == Set.new(h2.keys)` !? – akuhn Dec 09 '12 at 12:37
-
@akuhn, no, I haven't. OP mentioned small hashes so I don't think it's a good direction. – Jan Dec 09 '12 at 12:40
Just for the sake of having at least a benchmark on this question...
require 'securerandom'
require 'benchmark'
a = {}
b = {}
# Use uuid to get a unique random key
(0..1_000).each do |i|
key = SecureRandom.uuid
a[key] = i
b[key] = i
end
Benchmark.bmbm do |x|
x.report("#-") do
1_000.times do
(a.keys - b.keys).empty? and (a.keys - b.keys).empty?
end
end
x.report("#&") do
1_000.times do
computed = a.keys & b.keys
computed.size == a.size
end
end
x.report("#all?") do
1_000.times do
a.keys.all?{ |key| !!b[key] }
end
end
x.report("#sort") do
1_000.times do
a_sorted = a.keys.sort
b_sorted = b.keys.sort
a == b
end
end
end
Results are:
Rehearsal -----------------------------------------
#- 1.000000 0.000000 1.000000 ( 1.001348)
#& 0.560000 0.000000 0.560000 ( 0.563523)
#all? 0.240000 0.000000 0.240000 ( 0.239058)
#sort 0.850000 0.010000 0.860000 ( 0.854839)
-------------------------------- total: 2.660000sec
user system total real
#- 0.980000 0.000000 0.980000 ( 0.976698)
#& 0.560000 0.000000 0.560000 ( 0.559592)
#all? 0.250000 0.000000 0.250000 ( 0.251128)
#sort 0.860000 0.000000 0.860000 ( 0.862857)
I have to agree with @akuhn that this would be a better benchmark if we had more information on the dataset you are using. But that being said, I believe this question really needed some hard fact.

- 4,038
- 1
- 23
- 18
-
1I'd recommend adding the name of the benchmark to the `report` method as a parameter. That will enable adding the name to the result report, making it a lot easier to read. – the Tin Man Dec 09 '12 at 16:02
It depends on your data.
There is no general case really. For example, generally retrieving the entire keyset at once is faster than checking inclusion of each key seperately. However, if in your dataset, the keysets differ more often than not, then a slower solution which fails faster might be faster. For example:
h1.size == h2.size and h1.keys.all?{|k|h2.include?(k)}
Another factor to consider is the size of your hashes. If they are big a solution with higher setup cost, like calling Set.new
, might pay off, if however they are small, it won't:
h1.size == h2.size and Set.new(h1.keys) == Set.new(h2.keys)
And if you happen to compare the same immutable hashes over and over again, it would definitely pay off to cache the results.
Eventually only a benchmark will tell, but, to write a benchmark, we'd need to know more about your use case. For sure, testing a solution with synthetic data (as for example, randomly generated keys) will not be representative.

- 158,662
- 42
- 215
- 303

- 27,477
- 2
- 76
- 91
Here is my solution:
class Hash
# doesn't check recursively
def same_keys?(compare)
if compare.class == Hash
if self.size == compare.size
self.keys.all? {|s| compare.key?(s)}
else
return false
end
else
nil
end
end
end
a = c = { a: nil, b: "whatever1", c: 1.14, d: true }
b = { a: "foo", b: "whatever2", c: 2.14, "d": false }
d = { a: "bar", b: "whatever3", c: 3.14, }
puts a.same_keys?(b) # => true
puts a.same_keys?(c) # => true
puts a.same_keys?(d) # => false
puts a.same_keys?(false).inspect # => nil
puts a.same_keys?("jack").inspect # => nil
puts a.same_keys?({}).inspect # => false

- 303
- 2
- 6