-1

I have a list of IPv4s as byte[]'s from page visits. To get a unique list, I want to add them to a HashSet. Using the following code puts duplicate entries into the HashSet, which I used to believe was impossible.

byte[] a = new byte[] {0,0,0,1};
byte[] b = new byte[] {0,0,0,1};

HashSet<byte[]> hashBytes = new HashSet<byte[]>();
hashBytes.Add(a);
hashBytes.Add(b);

I expect the hashset to only contain a, but it contains both a and b.

EDIT: As per a commenter I added in the additional code:


    public class IPComparer : IEqualityComparer<byte[]>
    {
        public bool Equals(byte[] a, byte[] b)
        {
            //They are not an IP address
            if (a.Length != 4 && b.Length != 4)
            {
                return false;
            }
            for (int i = 0; i < a.Length; i++)
            {
                if (a[i] != b[i])
                {
                    return false;
                }
            }
            return true;
        }
        public int GetHashCode(byte[] a)
        {
            uint b = 0;
            for (int i = 0; i < a.Length; i++)
            {
                b = ((b << 23) | (b >> 9)) ^ a[i];
            }
            return unchecked((int)b);
        }

I also made the HashSet with new HashSet(IPComparer) instead of the regular comparer. The results did not change.

  • 1
    theres a builtin ipaddress type @EdPlunkett – Daniel A. White Sep 11 '19 at 18:47
  • 1
    just fyi, it appears you are using your equals method as some kind of validation whether something is an ip address and returning false if not. That is incorrect, Firslty one of the byte arrays passed in will be a byte array already in your hashset, secondly thats just not what this interface or method is about, it merely asks are they equal, thirdly the fact your return false, means it's not equal and so might get added to your hashset in the end anyway – Dave Sep 11 '19 at 20:25
  • So it took me a bit to see what you were trying to say. The first if statement in the Equals function should return true and not false to avoid being added into the HashSet. Changing that did not make the function work as expected regardless. – SunriseInternational Sep 11 '19 at 22:00

1 Answers1

10

To be used in a hash set, your class must implement GetHashCode and Equals properly. This isn't true of byte[] - arrays in general do not override those methods, so the hash set compares based on reference equality, not value equality.

The simplest solution would be to use IPAddress instead of byte[]. If that's not an option, you'll have to write your own comparer and pass it to the hash set. You can use SequenceEquals to compare arrays by value; a good hash value is a bit trickier, since without caching the value, this is going to be very slow - you'll probably want to make your own class that holds the byte[] and implements Equals and GetHashCode properly (which kind of brings you back to IPAddress :)).

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • I created the additional code as seen in my edit and wrote an Equals function with some assumptions. Adding to the set did not appear to call my custom Equals function despite being called when initialized. I tried using IPAddress and it worked, but slowly. So I discovered that converting to an int32 (4x8 bits) worked and only took 5 additional ticks per row, which solves the problem at a business level but not on a computer science level. Fortunately this is good enough as each billion rows only takes an extra 50 billion ticks (0.8 mins), which is negligible from my end. – SunriseInternational Sep 11 '19 at 19:12
  • 1
    @SunriseInternational `Equals` is only used for resolving collisions between hashes - if you have two objects with the same hash, they are compared using `Equals`; otherwise you know they're different objects. If you don't need to support IPv6, int works fine. – Luaan Sep 11 '19 at 19:14
  • This comment got me to the actual computer science solution, taking that into account writing the rest of the code to get the byte arrays to compare correctly was trivial. Thank you for the insight. – SunriseInternational Sep 11 '19 at 19:25