-2

I have an object which calculates a (long) path. Two objects are equal if the calculates the same path. I previously tested if two objects were equal by just doing something like:

obj1.calculatePath() == obj2.calculatePath()

However, now this has become a performance bottleneck. I tried storing the path inside the object but since I have a lot of objects this became a memory issue instead.

I have estimated that a 64 bits hash should be enough to avoid collisions - assuming the hash is good (bijective).

So, since the usual fast hashes (Murmur etc.) do have collisions I would like to avoid them since it sounds like a headache when you can just use a hash like SHA-2. (it's much nicer if I can just trust the hash instead of doing additional checks in case the hashes of two objects match)

However, SHA is also "slow" compared to older hash functions (like the MD family) I wonder is it would be better to use something like MD5 or maybe even MD4.

So my question is: Assuming there are no evil hacker with a motive of creating collisions with specially crafted input - but only benign (random) inputs. Which hash function should I choose for a performance critical part of my code where I would like to avoid the added complexity of using an "insecure" hash like Murmur.

Markus
  • 2,526
  • 4
  • 28
  • 35
  • I'm not sure your question makes sense. *All* hashes have collision, by definition. You say that "this has become a performance bottleneck", but calculating a hash and comparing that is by definition going to be *slower* than comparing the original value. You've said that you have enough objects that a list of paths became a memory issue instead - what constitutes a memory issue? Are you running on a machine with 256MB of memory, or do you genuinely have millions of objects with 1000-character paths? – Dan Puzey Nov 04 '14 at 10:10
  • @DanPuzey Thank you for yout time. I know all hashes have collisions, that's my the title says "(practically)". Maybe the rest of the post was unclear. However, Murmur even have collisions within words in the English language. In MD5 collisions are rare enough that there was a bounty just to find one. I think there is a distinction. I do not have many millions of paths, but maybe 1 million, and I do not know the length of them. However, I do run in a custom speed and memory restricted environment. – Markus Nov 04 '14 at 15:24
  • So, if you have a restricted environment, you should perhaps share those restrictions. As it is, your question is very hard to answer. It's also worth noting that "Murmur even have collisions within words in the English language" is of no relevance when you're hashing a file path. Have you tried hashing a large sample of your data to see if there are collisions? – Dan Puzey Nov 04 '14 at 15:27
  • Also worth noting: there is no such thing as a bijective hash *by definition*. You could say that a hash is bijective for a carefully selected subset of available inputs, but there's not going to be any way of working out which hash would be bijective against your inputs without *actually running the hash against your inputs*, which nobody on SO is likely to do for you. My difficulty with your question is that, as written, it's asking a question that cannot objectively be answered, and so the best answer is likely to be: go test some hash algorithms and find out. – Dan Puzey Nov 04 '14 at 15:30

1 Answers1

0

It's difficult to help without more information. As it stands all anyone can recommend is a generic hash-function. There's an element of give a few a go!

FNV-1a (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)

Is usually a not-too-shabby starting point. It's (a) easy to implement, (b) not usually 'bad' , (c) is computationally cheap and so applicable to your 'long' path issue.

However what I want to know is:

What space are these paths in? Are the in (x,y,z,t) 'real' space-time (i.e. trajectories)? Are paths through some graph? Are they file paths? Something else?

It's difficult to say more without more context.

Persixty
  • 8,165
  • 2
  • 13
  • 35