2

We found out that you cannot distinguish two Decimals by their hashValue if one is the negative of the other. We use Decimals as a field in a struct and that struct implements Hashable to be able to be put in a set. Our business logic then requires all fields to be unique, so all fields and combined for the hashValue. Meaning that two structs where our decimal field is the negative of the other and the rest of the fields are in fact equal, then the whole struct is considered equal. Which is not what we want.

Playground code:

for i in 0..<10 {
    let randomNumber: Int = Int.random(in: 0..<10000000)

    let lhs = Decimal(integerLiteral: randomNumber)
    let rhs = Decimal(integerLiteral: -randomNumber)

    print("Are \(lhs) and \(rhs)'s hashValues equal? \(lhs.hashValue == rhs.hashValue)")
    print("Are \(randomNumber) and \(-randomNumber)'s hashValues equal? \(randomNumber.hashValue == (-randomNumber).hashValue)\n")
}

The same happens when testing with doubleLiteral instead of integerLiteral.

The work around is to compare the Decimals directly, and optionally include it in the hashValue if required by other parts.

Is this behaviour intended? The mantissa is the same, so I guess the reason they're not considered equal is because the sign is not included in the Decimal's hashValue?

Simon
  • 470
  • 1
  • 7
  • 22
  • By definition equity of hash value is a necessity, but not a sufficiency for items to be equal. So what is the problem? – Marek R Jun 14 '19 at 11:55
  • @MarekR The problem is as I described in the post, that we have a decimal in our struct with other fields. And when expected to be different, they are not. It was when writing a unit test that I found one of them failing as I could not insert two structs into the same Set when this field we call `delta` had this relationship between the two structs – Simon Jun 14 '19 at 12:00
  • 1
    @Simon: That means that the implementation of `==` in your “other struct” is wrong. – Martin R Jun 14 '19 at 12:01
  • @MartinR In the case of Decimal, sure. But this is the first time I've come across this behaviour in a built-in struct/class/primitive. I'd like to know why it works this way, if it's a bug or if it's intended. As you said mapping all the values wouldn't work, but why was the decision taken to make `X.hashValue == -X.hashValue` like this? And given the code above I could add `print("Are \(randomNumber) and \(-randomNumber)'s hashValues equal? \(randomNumber.hashValue == -randomNumber.hashValue)")`, and this says false. So it has to do with Decimal, not mapping numbers as you say. – Simon Jun 14 '19 at 12:04
  • 5
    A hash value is used for performance optimization. It's a quick and dirty check for equality. If the hashes are different ==> not equal. If the hashes are the same, dang, you now have to go the extra mile and do the exhaustive comparison in ==. Your code should not be using the hash value for anything except performance. Your code should work correctly (if slowly) if all values had the exact same hash value. – vacawama Jun 14 '19 at 12:05
  • @Simon: I don't know the implementation of Decimal, and cannot say if that particular collision is “intentional” or not. The point is simply that there can be hash collisions, and you should not rely on distinct hash values for distinct values. – Martin R Jun 14 '19 at 12:09
  • @Simon: Btw, with Xcode 11 beta I get different hash values for the decimals X and -X. – Martin R Jun 14 '19 at 12:28
  • @MartinR given that change I'd assume it was unintentional. And yes, I understand that it is for optimization, and that's what we use it for. The struct also has a timestamp so the likelyhood of this happening is very small, but in my unit tests we do try to find edge cases and this was one of them. And what I was after was the fact that its hash function was useless, basically. Either way it seems I'll have to default to comparing the decimals directly. – Simon Jun 14 '19 at 12:38
  • 1
    Code your equality (`==`) function for accuracy. Provide a good hashing function for performance. A set will only call `==` if the hashes match, but if you've written your `==` function correctly, a struct with X and another with -X will not pass your `==` test, and so will both fit into the same set even if they return the same hash value. – vacawama Jun 14 '19 at 12:39

1 Answers1

2

Identical objects must have the same hash value, but not the other way around: Distinct objects can have the same hash value. Testing for equality must be done with == and never rely on the hash value alone.

In this particular case note that there are more than 264 Decimal values, so that it would actually be impossible to assign different hash values to all of them. (Similarly for strings, arrays, dictionaries, ...).

If you have a custom struct containing Decimal (and possibly other) properties then the implementation of the Equatable and Hashable protocol should look like this:

struct Foo: Hashable {

    let value: Decimal
    let otherValue: Int

    static func == (lhs: Foo, rhs: Foo) -> Bool {
        return lhs.value == rhs.value && lhs.otherValue == rhs.otherValue
    }

    func hash(into hasher: inout Hasher) {
        hasher.combine(value)
        hasher.combine(otherValue)
    }
}

Note that if all stored properties are Hashable then the compiler can synthesize these methods automatically, and it is sufficient to declare conformance:

struct Foo: Hashable {
    let value: Decimal
    let otherValue: Int
}

Remark: I assume that the behaviour is inherited from the Foundation type NSDecimalNumber. With Xcode 11 beta (Swift 5.1) x and -x have different hash values as Decimal, but the same hash value as NSDecimalNumber:

let d1: Decimal = 123
let d2: Decimal = -123

print(d1.hashValue) // 1891002061093723710
print(d2.hashValue) // -6669334682005615919

print(NSDecimalNumber(decimal: d1).hashValue) // 326495598603
print(NSDecimalNumber(decimal: d2).hashValue) // 326495598603

(Your values may vary since hash values are randomized as of Swift 4.2.) But the above still applies: There can always be collisions, and one cannot rely on different values having different hashes.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • It is impossible for all numbers to have a unique hash values, but the issue described by OP is not a matter of hash collision, but rather a faulty hashing algorithm. Random hash collision is one thing, but a number _always_ having the same hash as its negative is other thing. That being said, relying on hash being unique is a bad design decision. – mag_zbc Jun 14 '19 at 11:56
  • Note float has `-0` and `+0` and still this values are equal. – Marek R Jun 14 '19 at 12:00
  • 2
    @mag_zbc: Some Foundation types have horribly bad hash values. As an example, `NSArray` returns the number of elements :) – Martin R Jun 14 '19 at 12:05
  • @MartinR see my edit for doing the same to Integer. I do not think your second paragraph answers what I've asked. It is systematically producing the same hash for value X and its negation, which is different from not possibly assigning a unique hash for each Decimal value. – Simon Jun 14 '19 at 12:11
  • @Simon: Even assigning `-1234` as the hash value for every Decimal would be a valid (though impractical) implementation. – There are always collisions and it is futile to discuss if they are “intentional” or not. – Btw, the native Swift types (like integers) have much better hash value implementations. – Martin R Jun 14 '19 at 12:13
  • @MartinR what does the synthesized version of `==` do? Is it a field by field comparison? – Simon Jun 14 '19 at 12:40
  • 1
    @Simon: All stored properties are compared for equality. You can find the details in https://github.com/apple/swift-evolution/blob/master/proposals/0185-synthesize-equatable-hashable.md. – Martin R Jun 14 '19 at 12:44
  • @MartinR I accepted your answer, but I'd like you to move in the XCode 11 comment you had. Would be nice for it to be mentioned in the answer. – Simon Jun 14 '19 at 12:48