32

I'm investigating an exception that a colleague just got while running an application through Visual Studio 2010:

System.NullReferenceException was unhandled by user code
  Message=Object reference not set to an instance of an object.
  Source=mscorlib
  StackTrace:
       at System.Collections.Generic.GenericEqualityComparer`1.Equals(T x, T y)
       at System.Collections.Concurrent.ConcurrentDictionary`2.TryGetValue(TKey key, TValue& value)
       at xxxxxxx.xxxxxxx.xxxxxxx.RepositoryBase`2.GetFromCache(TIdentity id) 

Using .NET Reflector, I have looked at the code for
GenericEqualityComparer<T>.Equals(T x, T y), and I can't see any possible cause for a NullReferenceException.

//GenericEqualityComparer<T>.Equals(T x, T y) from mscorlib 4.0.30319.269
public override bool Equals(T x, T y)
{
    if (x != null)
    {
        return ((y != null) && x.Equals(y));
    }
    if (y != null)
    {
        return false;
    }
    return true;
}

The type of T, TKey and TIdentity are all the same type in this stack trace.

The type is a custom type called Identity that implements IEquatable<Identity>. It is immutable and cannot be constructed with null values for the fields that it uses in its implementation of Equals(Identity other). It also overrides Equals(object obj) like this:

public override bool Equals(object obj)
{
    if ((object)this == obj)
    {
        return true;
    }
    return Equals(obj as Identity);
}

public bool Equals(Identity other)
{
    if ((object)this == (object)other)
    {
        return true;
    }
    if ((object)other == null)
    {
        return false;
    }
    if (!FieldA.Equals(other.FieldA))
    {
        return false;
    }
    return FieldB.Equals(other.FieldB);
}

I have a fairly exhaustive set of unit tests around the Equals implementations. So, it will happily accept a value of null for other/obj and return false as expected.

The type does not either override the == operators nor != operators.

Even so, I would expect to see my class on top of the stack trace if the exception was being thrown from the implementation of Equals(Identity other) in my Identity class, but it says the NullReferenceException is coming from mscorlib.

I'm running on .NET Framework version 4.0.30319.269.

I don't have a memory dump, and I have not seen this before and have not reproduced it since. Still, I'm obliged to investigate and to be absolutely certain that it is not being caused by our code and that it won't happen in production.

So, the real question is: What caused this exception?

  • Bug in mscorlib (seems highly unlikely)
  • Transient memory corruption on the machine (possible, hard to back up with evidence)
  • Other?

* Updates in response to Jordão *

Is it possible to call the method with an object that is not an Identity?

The ConcurrentDictionary<TKey, TValue> is typed such that TKey = Identity and nothing subclasses Identity. So, I can't see how it could be possible.

Is it possible to call the method with null?

Unit tests cover the scenario of calling all of the Equals implementations with null.

What version of the code is the stack trace from? Maybe some older version susceptible to the exception?

I'm analyzing the same code that generated the exception. I have checked that the version of the .NET Framework running on my colleagues computer is also 4.0.30319.269.

Any multithreaded scenario could cause the exception? These are usually hard to reproduce, but might be worth investigating.

Yes, the code is multi-threaded and intended to be. So, that is why I'm using a ConcurrentDictionary.

* Followup related to response from Jalal Aldeen Saa'd *

I would have thought that a race condition where some other thread sets x to null could only be the cause if the parameter x was passed by reference using the 'ref' keyword. I set out to validate that theory with the following code:

ManualResetEvent TestForNull = new ManualResetEvent(false);
ManualResetEvent SetToNull = new ManualResetEvent(false);

[TestMethod]
public void Test()
{
    var x = new object();
    var y = new object();

    var t = Task.Factory.StartNew(() =>
    {
        return Equals(x, y);
    });
    TestForNull.WaitOne(); //wait until x has been tested for null value
    x = null;
    SetToNull.Set(); //signal that x has now been set to null
    var result = t.Result;
    Assert.IsFalse(result);
}

public bool Equals<T>(T x, T y)
{
    if (x != null)
    {
        TestForNull.Set(); //signal that we have determined that x was not null
        SetToNull.WaitOne(); //wait for original x value to be set to null
        //would fail here if setting the outer scope x to null affected
        //the value of x in this scope
        return ((y != null) && x.Equals(y)); 
    }
    if (y != null)
    {
        return false;
    }
    return true;
}

and the test completes without errors.

I can force that behavior if I change the signature to pass x and y by reference (that is, public bool Equals<T>(ref T x, ref T y) then the test fails with aNullReferenceException, but this does not match the method signature ofGenericEqualityComparer.Equals(T x, T y)`.

Picrofo Software
  • 5,475
  • 3
  • 23
  • 37
Eamon
  • 1,829
  • 1
  • 20
  • 21
  • In your overridden `Equals` implementation, perform a check for null on `obj` (and return false) and see if the error is still thrown. – keyboardP Oct 25 '12 at 00:58
  • Problem is this exception has only been observed once and I can't easily reproduce it so I am left trying to diagnose the cause essentially through static analysis of the code. – Eamon Oct 25 '12 at 01:13
  • Since you say it is multi-threaded, could it be that x is being set to null elsewhere, after the if check but before the equals. An easy way to check would be to add a sleep to the Equals override and set the x value to null in another thread. – John Koerner Oct 25 '12 at 01:56
  • 1
    Do you have unit tests that exercise your Equals methods in a multi-threaded environment? If not, I'd add some. – Mike Parkhill Oct 25 '12 at 02:06
  • Don't have any unit tests to explicitly test Equals in a multi-threaded way, but the objects are immutable and only compare private fields which are set in the construtor and cannot be null or the constructor would fail. Besides, ther error does not appear to come from my Equals method, but the GenericEqualityComparer. – Eamon Oct 25 '12 at 03:20
  • Are `WeakReference`-objects involved in this? – Michael Stum Oct 25 '12 at 19:46
  • no weak references; besides the local variable x would keep a weak reference alive. – Eamon Oct 31 '12 at 02:43

2 Answers2

4

I'll lay out my hypothesis here.

The stack is leading you to believe that this is where the crash occurs, but it occurs elsewhere. We're looking at the wrong thread.

I don't know if this would be practical, but sometimes good old "printf debugging" helps. What if you print out the value you're looking for before calling TryGetValue? You would see whether you strike a null or not.

MPelletier
  • 16,256
  • 15
  • 86
  • 137
  • Lack of reproducability prevents anything like good old "printf debugging". As a debugger was attached (the app was running through visual studio at the time) I guess maybe that has somehow caused it to report an incorrect stack or corrupt the x variable but to me that just feels wrong because in general [Select Isn't Broken](http://www.codinghorror.com/blog/2008/03/the-first-rule-of-programming-its-always-your-fault.html) as it's not like only a handful of people use Visual studio and .NET4, still at this point I'm ready to call this an anomaly and move on. – Eamon Oct 25 '12 at 21:58
  • @Eamon what you're saying is that at this point you can't reproduce the error? – MPelletier Oct 25 '12 at 22:19
  • 2
    Yes, that was stated in the original question. This is a one time error that I have not seen before nor since, hence the need to debug through static analysis of the code using the stacktrace given in the exception. As such I think it may be fair to class it as an anomaly due to some kind of transient issue (memory corruption etc.) beyond my control and move on to more important things. – Eamon Oct 26 '12 at 01:28
1

I ran into a null reference exception in Equals about a couple years ago (not sure if it was in 3.5 or 4.0, or if it was ever fixed). It's not clear to me what types are being compared in your case, but in my situation it would occur whenever comparing a MethodInfo reflection object for a generic method declaration to ANY non-MethodInfo object... Ka-boom! So, if you're comparing reflection objects, this could be it. If you're not, well at least I can testify to the fact that there is at least one Equals implementation in the BCL that can throw null reference exceptions for no good reason in certain situations, so there might well be others. Even the sacred .NET BCL is still software, and ALL software has bugs.

Ken Beckett
  • 1,273
  • 11
  • 13