3

Besides being a unique integer are there any performance considerations for the selection of a hashValue in a swift Hashable Type that may be inserted into a Set? For instance will the size of the integer values I choose affect the size of the backing array? i.e. if I assign a hashValue of 4000 to a Hashable type and insert that into a Set will the backing array need to be at least 4000 in length?

gloo
  • 2,490
  • 3
  • 22
  • 38
  • It's not clear what you mean here by "backing array" or why `hashValue` would have any relationship to it. (I suspect the answer is "no" but I don't really understand the question.) – Rob Napier Oct 21 '16 at 22:55
  • I'm thinking of `Set` in particular. Sorry for the omission – gloo Oct 21 '16 at 22:56
  • 1
    I assume you think Sets are implemented as an array where the hash value is the index? That's not how they're implemented. They're hash tables. https://en.wikipedia.org/wiki/Hash_table. The hash value itself is fairly irrelevant. Ideally it is random across the entire Int space so there aren't collisions and the table is balanced (can't remember if they use a binary tree or not; you can look here: https://github.com/apple/swift/blob/master/stdlib/public/core/HashedCollections.swift.gyb) – Rob Napier Oct 21 '16 at 23:00
  • But doesn't a hash table need a dynamic backing array? Maybe I have a misunderstanding of how a hash table works. The wiki article uses the hash values to index into "buckets" which I assume was an array. – gloo Oct 21 '16 at 23:04
  • 1
    At the end of the day, all of computer memory is one long Array, so sure, and most data structures have some kind of liberally indexed memory, but dig into how hash tables are built. It has nothing to do with the size of the value you return for the hash. – Rob Napier Oct 22 '16 at 00:47

1 Answers1

5

hashValue does not have to be a unique. In the vast majority of cases it can't be unique (any type that is larger than 64-bits will necessarily have more possible states than its hash). You don't choose the size of the integer. It will always be Int (which is the machine word size).

hashValue should be fast, however, and ideally O(1). It is often used to help optimize equality checking (which may be very slow).

The simplest implementation of hashValue is:

var hashValue: Int { return 1 }

This is a perfectly valid hash. It's not a particularly good hash, but it meets all the requirements. It is fast to calculate, and all equal objects will have equal hashes (which is a requirement; the converse is not required: equal hashes may not imply equal objects).

Rob Napier
  • 286,113
  • 34
  • 456
  • 610