How does the debugger get type information about an object initialized to null?

Question

If an object is initialized to null, it is not possible to get the type information because the reference doesn't point to anything.

However, when I debug and I hover over a variable, it shows the type information. Only the static methods are shown, but still, it seems to know the type. Even in release builds.

Does the debugger use other information than just reflection of some sort to find out the datatype? How come it knows more than I? And if it knows this, why isn't it capable of showing the datatype in a NullReferenceException?

JaredPar · Accepted Answer · 2012-01-12T16:58:47.210

10

It seems like you're confusing the type of the reference with the type of the value that it points to. The type of the reference is embedded into the DLL metadata and as readily accessible by the debugger. There is also aditional information stored in the associated PDB that the debugger leverages to provide a better experience. Hence even for null references a debugger can determine information like type and name.

As for NullReferenceException. Could it also tell you the type on which it was querying a field / method ... possibly. I'm not familiar with the internals of this part of the CLR but there doesn't seem to be an inherent reason why it couldn't do so.

But I'm not sure the added cost to the CLR would be worth the benefit. I share the frustration about the lack of information for a null ref exception. But more than the type involved I want names! I don't care that it was an IComparable, i wanted to know it was leftCustomer.

Names are somethnig the CLR doesn't always have access to as a good portion of them live in the PDB and not metadata. Hence it can't provide them with great reliability (or speed)

edited Jan 12 '12 at 16:58

answered Jan 12 '12 at 16:49

JaredPar

733,204
149
1,241
1,454

Ah, I forgot about the PDB, of course. But, for a local variable, how would that be stored in the DLL metadata? Or do you mean the ILASM? – Abel Jan 12 '12 at 17:00
@Abel for a local the name is stored in the PDB and the type in metadata (IIRC all constants are stored in the PDB). If you're ever curious about the break down compile some code, delete the PDB and bring up the disassembly in reflector. It's a good indication of what is available to the CLR. – JaredPar Jan 12 '12 at 17:01
I actually use Reflector quite often and indeed, the type info is there, but that's ILASM decompiled, which contains the types (but since you say "metadata", I guess it's time to study the CLI Annotated Standard a bit more thoroughly ;). – Abel Jan 12 '12 at 17:05
@Abel hmm misstatement on my part. Metadata is probably the wrong word. The types of locals though are embedded in the IL of the method bodies (else the CLR wouldn't be able to verify their usage). Note: refelctor reads PDB information so its default presentation is much closer to the debugger than the CLR – JaredPar Jan 12 '12 at 17:07
So, if I would try this myself, I would have to use something like `Assembly.GetExecutingAssembly`, get the IL, etc etc and get the necessary info. Possibly a useless exercise, but I was just wondering. – Abel Jan 12 '12 at 17:13
@Abel truthfully I'm not sure if the declarations are stored in the IL itself or in the method header (i believe the header though). – JaredPar Jan 12 '12 at 17:19

score 9 · Answer 2 · answered Jan 12 '12 at 17:36

Jared's answer is of course correct. Just to add a little to it:

when I debug and I hover over a variable, it shows the type information

Right. You have a bowl. The bowl is labelled "FRUIT". The bowl is empty. What is the type of the fruit in the bowl? You cannot say, because there isn't any fruit in the bowl. But that does not mean that you know nothing about the bowl. You know that the bowl can contain any fruit.

When you hover over a variable then the debugger can tell you about the variable itself or about its contents.

Does the debugger use other information than just reflection of some sort to find out the datatype?

Absolutely. The debugger needs to know not just what is the type of the thing referred to by this reference but also what restrictions are placed on what can be stored in this variable. All the information about what restrictions are placed on particular storage locations are known to the runtime, and the runtime can tell that information to the debugger.

How come it knows more than I?

I reject the premise of the question. The debugger is running on your behalf; it cannot do anything that you cannot do yourself. If you don't know what the type restriction on a particular variable is, it's not because you lack the ability to find out. You just haven't looked yet.

if it knows this, why isn't it capable of showing the datatype in a NullReferenceException?

Think about what is actually happening when you dereference null. Suppose for example you do this:

Fruit f = null;
string s = f.ToString();

ToString might be overloaded in Fruit. What code must the jitter generate? Let's suppose that local variable f is stored in a stack location. The jitter says:

copy the contents of the memory address at the stack pointer offset associated with f to register 1
The virtual function table is going to be, lets say eight bytes from the top of that pointer, and ToString is going to be, let's say, four bytes from the top of that table. (I am just making these numbers up; I don't know what the real offsets are off the top of my head.) So, start by adding eight to the current contents of register 1.
Now dereference the current contents of register 1 to get the address of the vtable into register 2
Now add four bytes to register 2
Now we have a pointer to the ToString method...

But hold on a minute, let's follow that logic again. The first step puts zero into register 1, because f contains null. The second step adds eight to that. The third step dereferences pointer 0x00000008, and the virtual memory system issues an exception stating that an illegal memory page has just been touched. The CLR handles the exception, determines that the exception happened on the first 64 K of memory, and guesses that someone has just dereferenced a null pointer. It therefore creates a null reference exception and throws it.

The virtual memory system surely does not know that the reason it dereferenced pointer 0x00000008 was because someone was trying to call f.ToString(). That information is lost in the past; the memory manager's job is to tell you when you touched something you don't have any right to touch; why you tried to touch memory you don't own is not its job to figure out.

The CLR could maintain a separate side data structure such that every time you accessed memory, it made a note of why you were attempting to do so. That way, the exception could have more information in it, describing what you were doing when the exception happened. Imagine the cost of maintaining such a data structure for every access to memory! Managed code could easily be ten times slower than it is today, and that cost is borne just as heavily by correct code as by broken code. And for what? To tell you what you can easily figure out yourself: which variable that contains null that you dereferenced.

The feature isn't worth the cost, so the CLR does not do it. There's no technical reason why it could not; it's just not practical.

Hi Eric, thanks for taking the time to respond so thoroughly. The vtable story kind-of always reminds me of COM, which worked largely in the same way (as does C++). I thought that an NRE was _on_ the 0x0 address, not close by. But if I follow your line of logic, are you saying that if I have, say, 16K methods, and I access the last one, I would run into undefined behavior (because trying to dereference something above 64K boundery)? // I know you worked on the CLR, but surely, this is a design choice, but if you'd chosen differently, you might have come up with an efficient algorithm ;). — Abel, Jan 12 '12 at 17:54
@Abel: In the example I gave, the crash was while attempting to find the pointer to the vtable. Whether there are 16K entries in that vtable or not is irrelevant; if we crash before we even find the vtable we're crashed. An interesting question is what happens when no vtable is involved; what if an object has more than 64K worth of fields? — Eric Lippert, Jan 12 '12 at 19:47
@Abel: But yes, a null ref exception is any exception in the bottom 64 K of virtual address space. Most null ref exceptions involve doing arithmetic on null before the dereference, so any small pointer dereference is likely to be caused by a null pointer dereference. — Eric Lippert, Jan 12 '12 at 19:49

How does the debugger get type information about an object initialized to null?

2 Answers2