0

I want to compute an unique sha1 hash from a ruby hash. I thought about

  • (Deep) Converting the Hash into an array
  • Sorting the array
  • Join array by empty string
  • calculate sha1

Consider the following hash:

hash = {
  foo: "test",
  bar: [1,2,3]
  hello: {
    world: "world",
    arrays: [
      {foo: "bar"}
    ]
  }
}

How can I get this kind of nested hash into an array like

[:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]

I would then sort the array, join it with array.join("") and compute the sha1 hash like this:

require 'digest/sha1'
Digest::SHA1.hexdigest hash_string
  1. How could I flatten the hash like I described above?
  2. Is there already a gem for this?
  3. Is there a quicker / easier way to solve this? I have a large amount of objects to convert (~700k), so performance does matter.

EDIT

Another problem that I figured out by the answers below are this two hashes:

a = {a: "a", b: "b"}
b = {a: "b", b: "a"}

When flattening the hash and sorting it, this two hashes produce the same output, even when a == b => false.

EDIT 2

The use case for this whole thing is product data comparison. The product data is stored inside a hash, then serialized and sent to a service that creates / updates the product data.

I want to check if anything has changed inside the product data, so I generate a hash from the product content and store it in a database. The next time the same product is loaded, I calculate the hash again, compare it to the one in the DB and decide wether the product needs an update or not.

23tux
  • 14,104
  • 15
  • 88
  • 187
  • This is an X/Y problem. While MRI Ruby hashes are ordered, you can't make guarantees about hash ordering. You have to compare specific key/value pairs, or rely on serialization order (perhaps after sorting). You may want to rethink the representation of your data. – Todd A. Jacobs Apr 18 '16 at 15:48
  • Please add a comma after `bar: [1,2,3]`. I can't understand why anyone giving an answer did not mention that omission. – Cary Swoveland Apr 18 '16 at 19:23
  • Your hash doesn't match the array you wish to produce from it. Again, why didn't those giving answers mention that? "Sloppy" is the word I would use to describe this question. – Cary Swoveland Apr 18 '16 at 20:07

5 Answers5

2

EDIT : As you detailed, two hashes with keys in different order should give the same string. I would reopen the Hash class to add my new custom flatten method :

class Hash
  def custom_flatten()
    self.sort.map{|pair| ["key: #{pair[0]}", pair[1]]}.flatten.map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem }.flatten
  end
end

Explanation :

  • sort converts the hash to a sorted array of pairs (for the comparison of hashes with different keys order)
  • .map{|pair| ["key: #{pair[0]}", pair[1]]} is a trick to differentiate keys from values in the final flatten array, to avoid the problem of {a: {b: {c: :d}}}.custom_flatten == {a: :b, c: :d}.custom_flatten
  • flatten converts an array of arrays into a single array of values
  • map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem } calls back fully_flatten on any sub-hash left.

Then you just need to use :

require 'digest/sha1'
Digest::SHA1.hexdigest hash.custom_flatten.to_s
Caillou
  • 1,451
  • 10
  • 19
  • The problem with this approach is, that the hash `a = {foo: "foo", bar: "bar"}` creates `"{:foo=>\"foo\", :bar=>\"bar\"}"` and `b = {bar: "bar", foo: "foo"}` creates a different representation `"{:bar=>\"bar\", :foo=>\"foo\"}"`, although `a == b => true` – 23tux Apr 18 '16 at 15:04
  • Right. I'll edit my answer to try to solve your comparison problem. – Caillou Apr 18 '16 at 15:33
  • Is this `Hash#fully_flatten()` function what you need ? – Caillou Apr 18 '16 at 15:41
  • this looks promising, thank you! I'll test it a bit with different hashes – 23tux Apr 18 '16 at 15:49
  • you can make this new method for `Hash` only available in the scope of a specific class with `class_eval` if you don't want to have this method available for all your project for some reasons. – Caillou Apr 18 '16 at 15:53
  • This does not work: `{ foo: { foo: :bar }, bar: :bar }.fully_flatten == { bar: { bar: :foo }, foo: :bar }.fully_flatten`. – beauby Apr 18 '16 at 15:57
  • Because you're not comparing to equivalent hashes, am I right ? the first one has a symbol `:bar` for `hash[:bar]`, the second one has a hash `{bar: :foo}` for `hash[:bar]`. – Caillou Apr 18 '16 at 16:01
  • @Caillou Yes, that's the whole point: with your proposed solution, two different hashes will have the same digest. – beauby Apr 18 '16 at 16:02
  • two different hashes yes, but two different hashes with the same values for the same keys. The only difference allowed is the order of the keys. Which is the behavior I except from two hashes. – Caillou Apr 18 '16 at 16:03
  • See my above comment: `a = { foo: { foo: :bar }, bar: :bar }; b = { bar: { bar: :foo }, foo: :bar }`. We have `a != b` and `a.fully_flatten == b.fully_flatten`. – beauby Apr 18 '16 at 16:04
  • More precisely, `a.fully_flatten == b.fully_flatten == [:bar, :bar, :foo, :foo, :bar]`. – beauby Apr 18 '16 at 16:06
  • Yep, I get it... we might need a trick to differenciate keys from values. I update my answer. – Caillou Apr 18 '16 at 16:07
  • I updated it with something more to identify keys. Do you have anymore use cases that need something more ? – Caillou Apr 18 '16 at 16:11
  • Now it fails for `a = { bar: "key: foo", foo: :bar }` `b = { bar: { foo: { foo: :bar } } }`. It is unlikely to happen, but still :) [My answer](http://stackoverflow.com/a/36698457/580231) is slightly simpler and shouldn't have any edge case. – beauby Apr 18 '16 at 17:08
1

I am not aware of a gem that does something like what you are looking for. There is a Hash#flatten method in ruby, but it does not flatten nested hashes recursively. Here is a straight forward recursive function that will flatten in the way that you requested in your question:

def completely_flatten(hsh)
  hsh.flatten(-1).map{|el| el.is_a?(Hash) ? completely_flatten(el) : el}.flatten
end

This will yield

hash = {
  foo: "test",
  bar: [1,2,3]
  hello: {
    world: "earth",
    arrays: [
      {my: "example"}
    ]
  }
}

completely_flatten(hash) 
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]

To get the string representation you are looking for (before making the sha1 hash) convert everything in the array to a string before sorting so that all of the elements can be meaningfully compared or else you will get an error:

hash_string = completely_flatten(hash).map(&:to_s).sort.join
#=> "123arraysbarearthexamplefoohellomytestworld"
dave_slash_null
  • 1,124
  • 7
  • 16
  • Thanks for the answer. As I commented below, this also does not solve the problem with the two hashes `{a: "a", b: "b"}` and `{a: "b", b: "a"}`. I've updated my question – 23tux Apr 18 '16 at 15:17
  • @23tux Can you provide any more information in your question about your use case? How are you using the sha1 hash in the end? What are you trying to validate? – dave_slash_null Apr 18 '16 at 15:21
  • I've added my use case to my question – 23tux Apr 18 '16 at 15:30
1

The question is how to "flatten" a hash. There is a second, implicit, question concerning sha1, but, by SO rules, that needs to be addressed in a separate question. You can "flatten" any hash or array as follows.

Code

def crush(obj)
  recurse(obj).flatten
end

def recurse(obj)
  case obj
  when Array then obj.map { |e| recurse e }
  when Hash  then obj.map { |k,v| [k, recurse(v)] }
  else obj
  end
end

Example

crush({
  foo: "test",
  bar: [1,2,3],
  hello: {
    world: "earth",
    arrays: [{my: "example"}]
  }
})
  #=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]

crush([[{ a:1, b:2 }, "cat", [3,4]], "dog", { c: [5,6] }])
  #=> [:a, 1, :b, 2, "cat", 3, 4, "dog", :c, 5, 6]
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

Use Marshal for Fast Serialization

You haven't articulated a useful reason to change your data structure before hashing. Therefore, you should consider marshaling for speed unless your data structures contain unsupported objects like bindings or procs. For example, using your hash variable with the syntax corrected:

require 'digest/sha1'

hash = {
  foo: "test",
  bar: [1,2,3],
  hello: {
    world: "world",
    arrays: [
      {foo: "bar"}
    ]
  }
}
Digest::SHA1.hexdigest Marshal.dump(hash)
#=> "f50bc3ceb514ae074a5ab9672ae5081251ae00ca"

Marshal is generally faster than other serialization options. If all you need is speed, that will be your best bet. However, you may find that JSON, YAML, or a simple #to_s or #inspect meet your needs better for other reasons. As long as you are comparing similar representations of your object, the internal format of the hashed object is largely irrelevant to ensuring you have a unique or unmodified object.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • This approach also has the problem, that this two hashes `{a: "a", b: "b"}` and `{b: "b", a: "a"}` produce different results, even when `a == b => true` – 23tux Apr 18 '16 at 15:31
  • 1
    @23tux That's a feature, not a bug. While the two objects may have object-level equality, the serialized objects *are* different. That's why you need to compare serialized objects, rather than unordered ones. If you're deliberately trying to hash unordered objects for comparison at the object level, the whole question is likely an X/Y problem that you haven't articulated well in your OP. – Todd A. Jacobs Apr 18 '16 at 15:42
0

Any solution based on flattening the hash will fail for nested hashes. A robust solution is to explicitly sort the keys of each hash recursively (from ruby 1.9.x onwards, hash keys order is preserved), and then serialize it as a string and digest it.

  def canonize_hash(h)
    r = h.map { |k, v| [k, v.is_a?(Hash) ? canonize_hash(v) : v] }
    Hash[r.sort]
  end

  def digest_hash(hash)
    Digest::SHA1.hexdigest canonize_hash(hash).to_s
  end

  digest_hash({ foo: "foo", bar: "bar" })
  # => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"
  digest_hash({ bar: "bar", foo: "foo" })
  # => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"
beauby
  • 550
  • 3
  • 11