7

I have some classes that contain several fields. I need to compare them by value, i.e. two instances of a class are equal if their fields contain the same data. I have overridden the GetHashCode and Equals methods for that.

It can happen that these classes contain circular references.

Example: We want to model institutions (like government, sports clubs, whatever). An institution has a name. A Club is an institution that has a name and a list of members. Each member is a Person that has a name and a favourite institution. If a member of a certain club has this club as his favourite institution, we have a circular reference.

But circular references, in conjunction with value equality, lead to infinite recursion. Here is a code example:

interface IInstitution { string Name { get; } }

class Club : IInstitution
{
    public string Name { get; set; }
    public HashSet<Person> Members { get; set; }

    public override int GetHashCode() { return Name.GetHashCode() + Members.Count; }

    public override bool Equals(object obj)
    {
        Club other = obj as Club;
        if (other == null)
            return false;

        return Name.Equals(other.Name) && Members.SetEquals(other.Members);
    }
}

class Person
{
    public string Name { get; set; }
    public IInstitution FavouriteInstitution { get; set; }

    public override int GetHashCode() { return Name.GetHashCode(); }

    public override bool Equals(object obj)
    {
        Person other = obj as Person;
        if (other == null)
            return false;

        return Name.Equals(other.Name)
            && FavouriteInstitution.Equals(other.FavouriteInstitution);
    }
}

class Program
{
    public static void Main()
    {
        Club c1 = new Club { Name = "myClub", Members = new HashSet<Person>() };
        Person p1 = new Person { Name = "Johnny", FavouriteInstitution = c1 }
        c1.Members.Add(p1);

        Club c2 = new Club { Name = "myClub", Members = new HashSet<Person>() };
        Person p2 = new Person { Name = "Johnny", FavouriteInstitution = c2 }
        c2.Members.Add(p2);

        bool c1_and_c2_equal = c1.Equals(c2); // StackOverflowException!
            // c1.Equals(c2) calls Members.SetEquals(other.Members)
            // Members.SetEquals(other.Members) calls p1.Equals(p2)
            // p1.Equals(p2) calls c1.Equals(c2) 
    }
}

c1_and_c2_equal should return true, and in fact we (humans) can see that they are value-equal with a little bit of thinking, without running into infinite recursion. However, I can't really say how we figure that out. But since it is possible, I hope that there is a way to resolve this problem in code as well!

So the question is: How can I check for value equality without running into infinite recursions?

Note that I need to resolve circular references in general, not only the case from above. I'll call it a 2-circle since c1 references p1, and p1 references c1. There can be other n-circles, e.g. if a club A has a member M whose favourite is club B which has member N whose favourite club is A. That would be a 4-circle. Other object models might also allow n-circles with odd numbers n. I am looking for a way to resolve all these problems at once, since I won't know in advance which value n can have.

Kjara
  • 2,504
  • 15
  • 42
  • 1
    Like you said, a infinite recursion can happen in circle of X length (example: after the 10 iteration it's pointing to the 2nd instituion and then the circle starts again). If the recursion is from navigation from 1 to 1 and not 1 to many, then I would include a list of "visited nodes" and verify if the current one was already processed, if yes, return. – Dryadwoods Oct 05 '17 at 13:12
  • 1
    As suggested above pass/keep a hashset or something for the reference already tested/called. It can be an optional argument set to null which gets created and passed after the first call or something. – Rob Oct 05 '17 at 13:16
  • In my opionion your `Equals` implementations are too strict. Why are two persons only equal if they have the same favorite institutuon? Why is a person different if he goes to a new club at this weekend? A simple workaround would be to introduce an `Id`-property – Tim Schmelter Oct 05 '17 at 13:20
  • 1
    @TimSchmelter The example with institutions and persons is artificial. It was just the best I could come up with that is neither too abstract nor too complicated. The stuff I am actually doing is not fit for a little code sample. – Kjara Oct 05 '17 at 13:23
  • In addition to caching the intermediate comparison result (so you can continue the comparison further in recursive calls) like other suggested, you can use a simple trick to cut this circular-referencing itself. You can use an `Id` (e.g. `GUID`) to represent a person or the Institution or both. This way, instead of comparing all the members of the `Institute`, you can simply compare the `Id`. Another way to implement the same strategy is to keep a `hashcode` (say MD5) which can uniquely represent the data instead of user provided `Id`, now it is auto-generated. – Vikhram Oct 05 '17 at 13:23
  • @Dryadwoods "If the recursion is from navigation from 1 to 1 and not 1 to many" - can you explain why you make this distinction? Why wouldn't the approach of storing visited nodes work for 1 to many navigation (as is the case here with `Members.SetEquals(other.Members)`)? – Kjara Oct 06 '17 at 07:57
  • @Kjara lets assumer this example: Institutions (I) from 1 to 7 https://i.imgur.com/p56cNzw.png Is the connection from I4 to I5 the same as the I3 to I5 ??? Well, the I5 was already visited when you reach I3, so should you go to I5 again? Well, it all depends on how "comparison" business logic you want to implement. – Dryadwoods Oct 06 '17 at 08:41

3 Answers3

2

An easy workaround (used in RDBMS) is to use a unique Id to identify a Person(any type). Then you don't need to compare every other property and you never run into such cuircular references.

Another way is to compare differently in Equals, so provide the deep check only for the type of the Equals and not for the referenced types. You could use a custom comparer:

public class PersonNameComparer : IEqualityComparer<Person>
{
    public bool Equals(Person x, Person y)
    {
        if (x == null && y == null) return true;
        if (x == null || y == null) return false;
        if(object.ReferenceEquals(x, y)) return true;
        return x.Name == y.Name;
    }

    public int GetHashCode(Person obj)
    {
        return obj?.Name?.GetHashCode() ?? int.MinValue;
    }
}

Now you can change the Equals implementation of Club to avoid that the Members(Persons) will use their deep check which includes the institution but only their Name:

public override bool Equals(object obj)
{
    if (Object.ReferenceEquals(this, obj))
        return true;

    Club other = obj as Club;
    if (other == null)
        return false;

    var personNameComparer = new PersonNameComparer();
    return Name.Equals(other.Name) 
        && Members.Count == other.Members.Count 
        && !Members.Except(other.Members, personNameComparer).Any();
}

You notice that i can't use SetEquals because there is no overload for my custom comparer.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

Following the suggestion of Dryadwoods, I changed the Equals methods so that I can keep track of the items that were already compared.

First we need an equality comparer that checks reference equality for corresponding elements of pairs:

public class ValuePairRefEqualityComparer<T> : IEqualityComparer<(T,T)> where T : class
{
    public static ValuePairRefEqualityComparer<T> Instance
        = new ValuePairRefEqualityComparer<T>();
    private ValuePairRefEqualityComparer() { }

    public bool Equals((T,T) x, (T,T) y)
    {
        return ReferenceEquals(x.Item1, y.Item1)
            && ReferenceEquals(x.Item2, y.Item2);
    }

    public int GetHashCode((T,T) obj)
    {
        return RuntimeHelpers.GetHashCode(obj.Item1)
            + 2 * RuntimeHelpers.GetHashCode(obj.Item2);
    }
}

And here is the modified Equals method of Club:

static HashSet<(Club,Club)> checkedPairs
    = new HashSet<(Club,Club)>(ValuePairRefEqualityComparer<Club>.Instance);

public override bool Equals(object obj)
{
    Club other = obj as Club;
    if (other == null)
        return false;

    if (!Name.Equals(other.Name))
        return;

    if (checkedPairs.Contains((this,other)) || checkedPairs.Contains((other,this)))
        return true;

    checkedPairs.Add((this,other));

    bool membersEqual = Members.SetEquals(other.Members);
    checkedPairs.Clear();
    return membersEqual;
}

The version for Person is analogous. Note that I add (this,other) to checkedPairs and check if either (this,other) or (other,this) is contained because it might happen that after the first call of c1.Equals(c2), we end up with a call of c2.Equals(c1) instead of c1.Equals(c2). I am not sure if this actually happens, but since I can't see the implementation of SetEquals, I believe it is a possibility.

Since I am not happy with using a static field for the already checked pairs (it will not work if the program is concurrent!), I asked another question: make a variable last for a call stack.

Kjara
  • 2,504
  • 15
  • 42
0

For the general case that I am interested in

-- where we have classes C1, ..., Cn where each of these classes can have any number of VALUES (like int, string, ...) as well as any number of REFERENCES to any other classes of C1, ..., Cn (e.g. by having for each type Ci a field ICollection<Ci>) --

the question "Are two objects A and B equal?", in the sense of equality that I described here,

seems to be EQUIVALENT to

the question "For two finite, directed, connected, colored graphs G and H, does there exist an isomorphism from G to H?".

Here is the equivalence:

  • graph vertices correspond to objects (class instances)
  • graph edges correspond to references to objects
  • color corresponds to the conglomerate of values and the type itself (i.e. colors of two vertices are the same if their corresponding objects have the same type and the same values)

That's an NP-hard question, so I think I'm going to discard my plan to implement this and go with a circular-reference-free approach instead.

Kjara
  • 2,504
  • 15
  • 42