-2

Is there a case when a hashcode collision would be beneficial?

(Other than when the objects are identical, of course.)

EDIT: beneficial meaning to calculate the hashcode in less cpu cycles, or use less memory in the calculation.

I guess a clarification would be: If a certain GetHashCode() is 10 times faster, but it also causes twice (for example) as many collisions, is it worth it?

Protiguous
  • 89
  • 2
  • 9
  • 3
    It depends on your definition of beneficial – cost Aug 06 '14 at 21:21
  • Maybe if you wanted to test your hash table? – Mike Christensen Aug 06 '14 at 21:22
  • I'm pretty sure this is not the right Stack Exchange website for this. Either way, this is too vague and doesn't have to do with programming. – Cullub Aug 06 '14 at 21:34
  • 1
    Even with your most recent edit, it's still very vague. "Is it worth it" is specific to each case. Do you hash a billion things a second? Do you hash 12 things a second? If you drive your car twice as fast to work, yet get twice as many collisions, is that worth it? Do you work for an emergency response team and you HAVE to drive you car fast? – cost Aug 06 '14 at 21:46

2 Answers2

1

'Beneficial' is a difficult term to quantify, especially in this case. It depends on your definition of beneficial.

If you're checking for object equality and they collide but the objects are not the same, then that would not be beneficial.

If you're building a hashmap, then you might have specific mechanisms built into your implementation to handle these cases. I'm fairly certain most (if not all) modern hashmap implementations do this.

You could also argue there's a bunch of fringe benefits, like maybe you're a mathematician or a security researcher, and you're looking to show the strength (or lack thereof) for the algorithm used in GetHashCode(). Or maybe you want to give an excellent proof-of-concept for why Microsoft should hire you for the .NET team.

Overall, your question is pretty vague. If there's something specific you're wondering, you should rethink/edit your question.

cost
  • 4,420
  • 8
  • 48
  • 80
1

To answer your question you first need to understand what a hash code is used for. A hash code is a fast "pre test" for checking the equality of two objects.

So is there a case where a collision is beneficial?

Yes, if in the process of generating the hash code you are spending a relatively large amount of time to create a more unique hash code the overhead of that generation may be more than the benefits you get from having a more unique hash.


To address your latest edit, the only way to tell if it is worth it is try both methods in place with your real data and see how the two compare. Doing a artificial head to head benchmark is not going to give you any meaningful information, things like hash code lookups depend too much on the data it is working with.

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431