3

I'm looking to implement an IEqualityComparer class that stores and compares floating point keys that are rounded to the nearest 0.01. In particular, I want to make sure I implement the GetHashCode method correctly. I would like to make this as efficient as possible. Can I use just use the float value itself as it's own hash?

I could multiply by 100, cast to int and use an int as a key, but I'm curious if this can be done with float keys.

Note: I would wrap the dictionary in a class to ensure that only values rounded to .01 are ever added or compared.

Follow up question: If I used Decimal (guaranteed to always be rounded to .01) could I just use the default comparer for Decimal with Decimal keys in a Dictionary?

My first thought is to try this implementation. Any pitfalls?

class FloatEqualityComparer : IEqualityComparer<float>
{
    public bool Equals(float b1, float b2)
    {
        int i1 = (int)(b1 * 100);
        int i2 = (int)(b2 * 100);
        if(i1 == i2)
            return true;
        else
            return false;
    }

    public float GetHashCode(float x)
    {
        return x;
    }
}
Aluan Haddad
  • 29,886
  • 8
  • 72
  • 84
Roci
  • 113
  • 3
  • 10

3 Answers3

5

The problem is the GetHashCode implementation. If two values might be considered equal, they must yield the same hash code. Values which yield different hash codes are assumed to be unequal.

Why not

sealed class FloatEqualityComparer : IEqualityComparer<float>
{
    public bool Equals(float x, float y) => Math.Round(x, 3) == Math.Round(y, 3);
    
    public int GetHashCode(float f) => Math.Round(f, 3).GetHashCode();
}

The reason for this is that the equality test is not performed if two hash codes are different. This is highly efficient, dramatically improving performance, as the Equals method only has to be called for pairs of elements with identical hash codes. Otherwise, each value would need to be compared to every other resulting in a computational complexity of O(N2).

Another way of putting this is to say that, if two elements should be compared to one another for equality because, their hash codes must collide.

Finally, we'll clean up our implementation to remove duplicate code and follow Microsoft's recommended practices for providing custom equality comparers.

sealed class FloatEqualityComparer : EqualityComparer<float>
{
    public override bool Equals(float x, float y) => GetEquatable(x) == GetEquatable(y);
    
    public override int GetHashCode(float f) => GetEquatable(f).GetHashCode();

    private static float GetEquatable(float f) => Math.Round(f, 3);
}

This removes duplicate code, preventing equality and hashing logic from drifting apart if revised. It also follows Microsoft's recommendation to prefer extending EqualityComparer<T> over implementing IEqualityComparer<T> directly. This latter change is peculiar to the equality comparison API exposed by the BCL and by no means a general guideline and is documented here. Note that the interface is still implemented under this approach as the implementation is inherited from the base class.

Aluan Haddad
  • 29,886
  • 8
  • 72
  • 84
  • This improves my limited knowledge of hash codes in dictionaries. One question about the Equals implementation: Will Math.Round() always generate a precisely rounded float? e.g. Could I end up with 1.31 != 1.31 because of the imprecision of floats? Maybe multiplying then casting to int is more expensive, but it is guaranteed to be precise. – Roci Feb 07 '18 at 02:52
  • 1
    Not after they have been rounded. It will work fine. – Aluan Haddad Feb 07 '18 at 02:52
  • @Roci a good way to test this stuff out is to open up the C# interactive window in Visual Studio or use LINQPad. – Aluan Haddad Feb 07 '18 at 03:03
  • 1
    I would not use `Math.Round(f, 3)` as guaranteed way to have identical results (as `Round` does not guarantee particular representation for non-integer rounding as far as I understand)... Also it is mostly theoretical comment as it is unlikely what OP actually needs anyway - rounding + ashing will always have corner cases where almost the same values fall into different buckets... – Alexei Levenkov Feb 07 '18 at 03:35
  • So you would suggest multiply then truncate? – Aluan Haddad Feb 07 '18 at 03:36
  • @AluanHaddad - I do not see how running this code would give me insight into the implementation details about Math.Round and floating point precision. Just because it passes a few test cases in an interactive window doesn't mean it will pass every case. – Roci Feb 07 '18 at 03:59
  • @Roci I just meant in general since you seem to be new to the framework interactive coding is a good way to learn. – Aluan Haddad Feb 07 '18 at 04:04
1

Floating point equality is messy. Just trying to define what it actually means is messy.

First lets consider what happens when you round the numbers.

float x = 0.4999999;
float y = 0.5000000;
float z = 1.4999999;
Assert.Equals(false, Math.Round(x) == Math.Round(y));
Assert.Equals(true, Math.Round(y) == Math.Round(z));

If you're trying to model a real world process, I would expect that x and y would be a lot more equal than y and z. But rounding forces y and z into the same bucket, and x into a different one.

No matter what scale you choose your rounding, there will always be numbers which are arbitrarily close together which are considered different, and numbers which are on opposite ends of your scale which are considered the same. If your numbers are generated by some arbitrary process, you never know if two numbers which should be considered equal will fall on the same side of or opposite sides of a boundary. If you choose to round to the nearest 0.01, the exact same example works if you just multiply x, y, and z in the example by 0.01.

Let's say you consider equality by the distance between two numbers.

float x = 4.6;
float y = 5.0;
float z = 5.4;
Assert.Equals(true, Math.Abs(x - y) < 0.5);
Assert.Equals(true, Math.Abs(y - z) < 0.5);
Assert.Equals(false, Math.Abs(x - z) < 0.5);

Now numbers which are close together are always considered equal, but you've given up the transitive property of equality. That means that x and y are considered equal, and y and z are considered equal, but x and z are considered not equal. Obviously you can't build a hashset without transitive equality.

The next thing to consider is that if you're doing calculations, floating point numbers can have different precision depending on how they're stored. It's up to the compiler to decide where they will be stored, and it can convert them back and forth whenever it wants to. Calculations will be done in registers, and it can make a difference when those registers get copied to main memory, and when they lose that precision. This is harder to demonstrate in code, because it's really up to how it compiles, so let's use a hypothetical example to illustrate.

float x = 4.49;
float y = Math.Round(x, 1); // equals 4.5
float z1 = Math.Round(x); // 4.49 rounds to 4
float z2 = Math.Round(y); // 4.5 rounds to 5
Assert.Equals(false, z1 == z2);

Depending on whether the intermediate result got rounded or not, I get a different result on the final rounding. Obviously going registers -> memory isn't rounding to 1 decimal digit, but this illustrates the principal that when you choose to round can impact your result. If you pass 2 numbers to the equality function that are supposed to be the same, and one came from memory, and the other from a register, you could potentially get something that rounds 2 different ways.

EDIT: Another part to consider that may not make a difference this case, is that a float only has 24 bits of mantissa. That means that once you get past 2 to the 24th power, or 16,777,216, numbers that you would expect to be different will come back as equal, no matter what precision you thought you were rounding them to.

float x = 17000000;
float y = 17000001;
Assert.Equals(true, x == y);

So if you're fine with all those caveats because all you want is something that works most of the time, you can probably get away with trying to hash on floating point numbers. But no matter how you try to define floating point equality, you'll always end up with unexpected behavior.

Bryce Wagner
  • 2,640
  • 1
  • 26
  • 43
-2

There is nothing in the .NET documentation to say that floating values returned from Math.Round() will pass the equality comparison when they should, e.g. 2.32 should always equal 2.32, but if either value is plus or minus float.Epsilon, the equality could befalse. This risks creating 2 keys for the same value shifted by only float.Epsilon. I'm solving this unlikely (although buggy) issue by handling the rounding by multiplying and casting to int instead of calling Math.Round().

sealed class FloatEqualityComparer : IEqualityComparer<float>
{
    int GetPreciseInt(float f)
    {
        int i1 = (int)(b1 * 100);
        int i2 = (int)(b2 * 100);
        return (i1 == i2);
    }

    public bool Equals(float f1, float f2) => GetPreciseInt(f1) == GetPreciseInt(f2);
    public int GetHashCode(float f) => GetPreciseInt(f).GetHashCode();
}

*I am not concerned about the edge cases in rounding floating point numbers of finite precision, rather I am concerned about using those rounded imprecise floats as keys in a dictionary.

Roci
  • 113
  • 3
  • 10
  • This is pointless. The entire purpose of the equality comparer protocol that all collections in the BCL conform to is that you don't have to do this. – Aluan Haddad Feb 10 '18 at 00:09
  • This is not pointless. Rather, it is only applicable to a very specific use case. I am abstracting the user from converting between floats and ints. This does NOT change (or say anything at all) about the purpose of the equality comparer protocol. The answer to my original question was: "Don't implement IEqualityComparer for floats to solve this problem. Use int keys instead." I discovered this while playing around with the interactive console, ty ... jk – Roci Feb 11 '18 at 00:48
  • That's not what I said. Also, I directly answered your question and you have not done so here. Rather, you've decided to sidestep your original question even though it is answerable. – Aluan Haddad Feb 11 '18 at 03:45
  • 1
    I gave you kudos for the for your GetHashCode implementation and explanation. Can you provide some confirmation that Math.Round() is guaranteed to give precise doubles for the equality comparison. Re-read the comments underneath your answer. When I asked you about Math.Round's precision, you replied that I should test things in an interactive window (a non-sequitur according to the following comment you made). – Roci Feb 11 '18 at 06:09
  • fair enough, it seems there's a misunderstanding, possibly due to the reordering of the comments after they were voted on. A commenter mentioned that it might not be reliable, so I asked him to follow up and he did not as yet. The remark I made about using interactive coding was that it's a good way to write quick, unit style tests when you want to see what the results of an algorithm are. It wasn't specific to this question. – Aluan Haddad Feb 11 '18 at 06:24
  • My problem with this answer is that it doesn't answer your question which is about equality comparers. I understand that you opted for the multiplication and truncation approach, which is fine, but you can still do that in a comparer which would enable you to use other collections such as sets to locate elements and also works with the System.Linq.Enumerable extension methods. In other words, your basic approach was correct, Round being an implementation, but the reason it wasn't working for you was because of the hashing. – Aluan Haddad Feb 11 '18 at 06:31
  • Looks like you didn't test your code - `GetPreciseInt()` doesn't compile. – ClickRick Mar 28 '21 at 17:03