-1

I want to implement a vector class for doubles in C# and need to override Equals and GetHashCode so I can use my vector class as a key in a Dictionary or use HashSets. Since I need a certain tolerance towards equality, I know there is no way of implementing a transitive Equals method and a corresponding GetHashCode method.

I stumbled upon an answer on a similar thread: https://stackoverflow.com/a/580972/5333340

And I would like to know, is there a way to change the lookup behaviour of HashSet / Dictionaries in C# so that it does not only check one bucket, but several buckets?

Or is there some class that has this behaviour for C#?

Community
  • 1
  • 1
Kjara
  • 2,504
  • 15
  • 42
  • `Equals` might need some tolerance, but surely you can have values in a certain range return the same hashcode? – Charles Mager Jun 30 '16 at 15:30
  • I don't think this is going to work; the bucket selection is not linear - there is no mechanism to determine "nearby buckets" - you'd have to check all of the buckets - and check each element inside the bucket via `Equals`, and at that point you might as well have used linear search in the first place (`List` etc). I don't think a dictionary/hashset can help you here. Maybe using a sorted list is the best option, then you just need to scan a range by key. – Marc Gravell Jun 30 '16 at 15:30
  • @CharlesMager not really, because by definition nearby ranges (based on tolerance) will overlap each-other – Marc Gravell Jun 30 '16 at 15:31
  • @MarcGravell fair point! It's just occurred to me that I've even come across this and implemented `GetHashCode` as `return 0` for exactly that reason (I had no requirement to use the object in a hash table and it was the simplest implementation that didn't break the contract). – Charles Mager Jun 30 '16 at 17:06
  • Can anyone tell me why I got -1 for the question? – Kjara Jul 04 '16 at 13:32
  • @MarcGravell I don't need a pre-built mechanism to determine "nearby buckets". I want to implement it on my own. Like: Dear HashSet, if you are asked if `elem` is contained in you, do the following: 1. Calculate the HashCode of `elem`. 2. Use `myfunc` on the HashCode of `elem` to get a set of neighbouring HashCodes. 3. Search the bucket (via `Equals`) of the HashCode of `elem` as well as the buckets of the HashCodes just calculated via `myFunc`. And my question is, is it possible to do such a three-step thing in C# for HashSets? – Kjara Jul 04 '16 at 13:38
  • @Kjara that makes no sense; again, there is no concept of "neighbouring", and hash-codes are not expected to be linear with values. The only tests you have available to you are "probably equal vs definitely not equal" (hash-code) and "definitely equal / not equal" (equals). The question is IMO malformed; the **concept** is impossible for this data structure. To repeat: the only way that could possibly be implemented is with a full scan, which defeats any purpose of using this data structure. – Marc Gravell Jul 04 '16 at 14:15
  • I see. So the answer to my question is "No, it is not possible to customize the bucket searching mechanism for `HashSet` in C# aside from providing different implementations of `Equals` / `GetHashCode`". Thanks for the clarification. I still don't see why my question is bad - or "malformed", as you call it. Is a question bad just because it's answer is "no"? – Kjara Jul 04 '16 at 14:25
  • @MarcGravell I have the feeling that you don't understand what I mean by "implement my own mechanism to determine nearby buckets". It is just a function that I define: Given an integer x (the HashCode of some element), `myfunc(x)` returns a set of integers. Those integers are defined as being the "neighbouring ones" of x - by me, via the function `myfunc` (by whatever criteria)! Not by any built-in concept of being neighboured. Of course, my implementation of `GetHashCode` must be in line with `myfunc` - but again, that's my responsibility. – Kjara Jul 04 '16 at 14:57
  • @Kjara and I don't think you understand what I mean when I say that that simply doesn't make sense in the general case. It *only* works for discreet values; you cannot, for example, sensibly list all the `double` values between any two points - there are a vast vast number of them; and for things like tuples, it explodes n-dimensionally. If you want to check nearby buckets for discreet values, then sure: you could do that manually, but it isn't a scenario that any of the built in data structures target, because it simply doesn't make sense. – Marc Gravell Jul 04 '16 at 15:39
  • It does make sense in the general case. The general case is that `GetHashCode` forms an equivalence relation on the set of all objects of a given type. There is a maximum of 2^32 equivalence classes (that many HashCodes). That looks discrete enough to me - and this is ALWAYS the case. – Kjara Jul 04 '16 at 16:51

1 Answers1

0

Since HashSet does not provide means to customize bucket searching behaviour, I wrote a custom class that does a several-bucket-search. An example for a real-life use is included: a 3-dimensional vector class.

// Implementing this interface introduces the concept of neighbouring buckets.
public interface IHasNeighbourConcept
{
    int[] GetSeveralHashCodes();
    // The returned int[] must at least contain the return value of GetHashCode.
}

// Custom HashSet-like class that can search in several buckets.
public class NeighbourSearchHashSet<T> where T : IHasNeighbourConcept
{
    // Internal data storage.
    private Dictionary<int, List<T>> buckets;

    // Constructor.
    public NeighbourSearchHashSet()
    {
        buckets = new Dictionary<int, List<T>>();
    }

    // Classic implementation utilizing GetHashCode.
    public bool Add(T elem)
    {
        int hash = elem.GetHashCode();

        if(!buckets.ContainsKey(hash))
        {
            buckets[hash] = new List<T>();
            buckets[hash].Add(elem);
            return true;
        }

        foreach(T t in buckets[hash])
        {
            if(elem.Equals(t))
                return false;
        }

        buckets[hash].Add(elem);
        return true;
    }

    /// Nonclassic implementation utilizing GetSeveralHashCodes.
    public bool Contains(T elem)
    {
        int[] hashes = elem.GetSeveralHashCodes();

        foreach(int h in hashes)
            foreach(T t in buckets[h])
                if(elem.Equals(t))
                    return true;
        return false;
    }


}


// A 3-dimensional vector class. Since its Equals method is not transitive,
// there can be vectors that are considered equal but have different HashCodes.
// So the Contains method of HashSet<Vector> does not work as expected.
public class Vector : IHasNeighbourConcept
{
    private double[] coords;
    private static double TOL = 1E-10;
    // Tolerance for considering two doubles as equal

    public Vector(double x, double y, double z)
    {
        if(double.IsNaN(x) || double.IsInfinity(x) ||
           double.IsNaN(y) || double.IsInfinity(y) ||
           double.IsNaN(z) || double.IsInfinity(z))
            throw new NotFiniteNumberException("All input must be finite!");

        coords = new double[] { x, y, z };
    }

    // Two vectors are equal iff the distance of each
    // corresponding component pair is significantly small.
    public override bool Equals(object obj)
    {
        if(!(obj is Vector))
            throw new ArgumentException("Input argument is not a Vector!");

        Vector other = obj as Vector;

        bool retval = true;
        for(int i = 0; i < 2; i++)
            retval = retval && (Math.Abs(coords[i] - other.coords[i]) < TOL);

        return retval;

    }

    // The set of all Vectors with the same HashCode
    // is a cube with side length TOL.
    // Two Vectors considered equal may have different
    // HashCodes, but the x, y, z intermediate values
    // differ by at most 1.
    public override int GetHashCode()
    {
        int x =(int) Math.Truncate(coords[0] / TOL);
        int y =(int) Math.Truncate(coords[1] / TOL);
        int z =(int) Math.Truncate(coords[2] / TOL);
        return x + 3*y + 5*z; // The purpose of the factors is to make
                              // permuting the coordinates result
                              // in different HashCodes.
    }

    // Gets the HashCode of the given Vector as well as the 26
    // HashCodes of the surrounding cubes.
    public int[] GetSeveralHashCodes()
    {
        int[] hashes = new int[27];
        int x =(int) Math.Truncate(coords[0] / TOL);
        int y =(int) Math.Truncate(coords[1] / TOL);
        int z =(int) Math.Truncate(coords[2] / TOL);

        for(int i = -1; i <= 1; i++)
            for(int j = -1; j <= 1; j++)
                for(int k = -1; k <= 1; k++)
                    hashes[(i+1)+3*(j+1)+9*(k+1)] = (x+i) + 3*(y+j) + 5*(z+k);
        return hashes;
    }
}

EDIT:

The above implementation extends the concept of HashSet such that even without a transitive Equals method, the Contains method of the set works correctly. It works because for Contains we do not need to know the exact equivalence class our sought element is in.

However, for Dictionaries it is different. We do need to get the correct equivalence class (i.e. hashCode), otherwise we get a different image. Thus, different HashCodes MUST result in the elements not being equal.

Kjara
  • 2,504
  • 15
  • 42