23

Extract from CLR via C# on Boxing / Unboxing value types ...

On Boxing: If the nullable instance is not null, the CLR takes the value out of the nullable instance and boxes it. In other words a Nullable < Int32 > with a value of 5 is boxed into a boxed-Int32 with a value of 5.

On Unboxing: Unboxing is simply the act of obtaining a reference to the unboxed portion of a boxed object. The problem is that a boxed value type cannot be simply unboxed into a nullable version of that value type because the boxed value doesn't have the boolean hasValue field in it. So, when unboxing a value type into a nullable version, the CLR must allocate a Nullable < T > object, initialize the hasValue field to true, and set the value field to the same value that is in the boxed value type. This impacts your application performance (memory allocation during unboxing).

Why did the CLR team go through so much trouble for Nullable types ? Why was it not simply boxed into a Nullable < Int32 > in the first place ?

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Preets
  • 6,792
  • 12
  • 37
  • 38
  • "Memory allocation during unboxing" What??? The book is definitely wrong on that regard. – Ben Voigt Nov 03 '14 at 22:45
  • @BenVoigt Unboxing into a nullable type is about 20 × slower that unboxing into a normal type, but only if it is a regular casting, the `as` operator is not slower than regular unboxing. – IS4 Nov 04 '14 at 00:12
  • Behavior is currently described on https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/nullable-types/using-nullable-types#boxing-and-unboxing – Alexei Levenkov Apr 11 '19 at 01:20

5 Answers5

24

I remember this behavior was kind of last minute change. In early betas of .NET 2.0, Nullable<T> was a "normal" value type. Boxing a null valued int? turned it into a boxed int? with a boolean flag. I think the reason they decided to choose the current approach is consistency. Say:

int? test = null;
object obj = test;
if (test != null)
   Console.WriteLine("test is not null");
if (obj != null)
   Console.WriteLine("obj is not null"); 

In the former approach (box null -> boxed Nullable<T>), you wouldn't get "test is not null" but you'd get "object is not null" which is weird.

Additionally, if they had boxed a nullable value to a boxed-Nullable<T>:

int? val = 42;
object obj = val;

if (obj != null) {
   // Our object is not null, so intuitively it's an `int` value:
   int x = (int)obj; // ...but this would have failed. 
}

Beside that, I believe the current behavior makes perfect sense for scenarios like nullable database values (think SQL-CLR...)


Clarification:

The whole point of providing nullable types is to make it easy to deal with variables that have no meaningful value. They didn't want to provide two distinct, unrelated types. An int? should behaved more or less like a simple int. That's why C# provides lifted operators.

So, when unboxing a value type into a nullable version, the CLR must allocate a Nullable<T> object, initialize the hasValue field to true, and set the value field to the same value that is in the boxed value type. This impacts your application performance (memory allocation during unboxing).

This is not true. The CLR would have to allocates memory on stack to hold the variable whether or not it's nullable. There's not a performance issue to allocate space for an extra boolean variable.

Community
  • 1
  • 1
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • From what I understand, in the current implementation (Step 1) if test is null, the CLR does not box anything and returns null. (Step 2) If the nullable instance is not null, it boxes it into a boxed-int32. Doesn't Step 1 solve the "obj is not null" problem? Why did they have to do step 2 ? Sorry, but I seem to be missing something. – Preets Sep 07 '09 at 05:13
  • Preets: You mean they would box `null` to a null reference and box `int? x = 4;` to `boxed-Nullable`? – Mehrdad Afshari Sep 07 '09 at 05:17
  • Umm.. yeah.. is that not possible ? – Preets Sep 07 '09 at 05:19
  • 2
    Preets: If they had done that, you couldn't unbox it directly to an `int`. – Mehrdad Afshari Sep 07 '09 at 05:25
  • 1
    What's the point though? Boxing a non-null nullable as a Nullable<...> is simply wasting a boolean value, thus (slightly) increasing GC pressure and reducing processor cache for no good reason. The whole idea behind Nullable<...> is that it represents a value type that happens to be able to be null - but that entire extra step is unnecessary for boxed values which can inherently be null anyhow. – Eamon Nerbonne Sep 07 '09 at 05:27
  • 2
    While I can sort of understand the processor cache argument (I don't think it matters most of the time), I am not sure about the GC pressure argument. Whether you box and int or box a Nullable, the GC still handles it as a single block. Creating it is just an allocation (near-free with the GC), and deleting means mark/sweep/compact is still going to mark, sweep, and compact a block regardless. I can't see any difference in GC load...just a difference in the size of the allocated block. I think the crux of the matter is what Mehrdad stated: "unbox directly to an int". – jrista Sep 07 '09 at 05:49
  • @EamonNerbonne: Keeping the identities of possibly-nested nullable types makes it possible to for a collection of arbitrary type T distinguish between a return that means "Key X has no associated value", and "Key X has been associated with a null value". – supercat Dec 23 '12 at 17:28
  • @supercat: that's still possible. First of all, the collection itself internally can presumably distinguish between an unstored value (not present in whatever datastructure used) and a stored null value. So then it's a question of API: most .NET collection API's (and certainly the newer ones) make it possible to distinguish present null values from the absence of a value. To store a value or a fallback such as null in a structure with an API that doesn't distinguish null from absence, simply wrap the value in your own `Maybe` structure (a common pattern in functional programming anyhow). – Eamon Nerbonne Jan 02 '13 at 14:24
  • @supercat: in short: you can do whatever a boxed `Nullable<>` could do easily yourself, but you can't emulate the current behaviour of auto-unwrapping yourself without VM support. Thus the current behavior is better (and probably faster). Not to mention the fact that usually you can let the typesystem deduce the type of a null-placeholder so that the need to do so at runtime is fairly rare. Optimizing for such a corner case (you can't statically determine the type, and you do actually care about the type of the "null" even though no value is present) isn't likely to be useful. – Eamon Nerbonne Jan 02 '13 at 14:30
  • @EamonNerbonne: I don't really see much advantage to the unusual behavior of `Nullable`. While there's nothing preventing user implementation of a `Maybe` type, there's substantial value in having different libraries that need the same things in a type use *the same* type. If Moe writes an interface with a method that returns a `Moe.Maybe`, and Larry writes one which uses a `Larry.Maybe`, someone trying to implement both interfaces will have to use different methods to deal with the different types. It would be cleaner if there were one standard `Maybe`... – supercat Jan 02 '13 at 15:59
  • ...which simply contained `public T Value; public bool IsValid;` along with the natural constructors. The struct would not imply any particular semantics for those two fields--the semantics would be specified by any methods which return or accept an instance of the structure. Some people may dislike the notion of exposed fields, but they can greatly reduce the performance cost of using a composite type. For example, one saying `Result.IsValid = MyDict.TryGetValue(ref Result.Value);` would eliminate the need to have `TryGetValue` put the value into a temp before it's copied to `Result`. – supercat Jan 02 '13 at 16:04
  • I agree - I'd love to see a standard maybe, or better yet, real support for discriminated unions. Nevertheless, `Nullable<>` isn't that type - it just adds `null` to value types. It was never going to be able to fit both needs perfectly. – Eamon Nerbonne Jan 03 '13 at 10:00
  • What's more interesting are the lifted operators. They are handled by the compiler and share some similarities with `NaN` as error-indicators. – IS4 Nov 04 '14 at 00:27
9

I think it makes sense to box a null value to a null reference. Having a boxed value saying "I know I would be an Int32 if I had a value, but I don't" seems unintuitive to me. Better to go from the value type version of "not a value" (a value with HasValue as false) to the reference type version of "not a value" (a null reference).

I believe this change was made on the feedback of the community, btw.

This also allows an interesting use of as even for value types:

object mightBeADouble = GetMyValue();

double? unboxed = mightBeADouble as double?;
if (unboxed != null)
{
    ...
}

This is more consistent with the way "uncertain conversions" are handled with reference types, than the previous:

object mightBeADouble = GetMyValue();

if (mightBeADouble is double)
{
    double unboxed = (double) mightBeADouble;
    ...
}

(It may also perform better, as there's only a single execution time type check.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The normal use I'm aware of for `Nullable` is in situations where a function may or may not have a `T` to return. Conceptually, that pattern is just as applicable when `T` is a reference type or nullable type, as when it is a non-nullable value type. If one had a collection of `Nullable`, a logical return type for a `TryGetXXX` method would be a `Nullable``. If the return's `HasValue` is false, that meant it couldn't get a `Nullable`. If the outer one is true, the return value is whatever was stored in the collection (which could be null). – supercat Dec 23 '12 at 17:32
5

A thing that you gain via this behavior is that the boxed version implements all interfaces supported by the underlying type. (The goal is to make Nullable<int> appear the same as int for all practical purposes.) Boxing to a boxed-Nullable<int> instead of a boxed-int would prevent this behavior.

From the MSDN Page,

double? d = 44.4;
  object iBoxed = d;
  // Access IConvertible interface implemented by double.
  IConvertible ic = (IConvertible)iBoxed;
  int i = ic.ToInt32(null);
  string str = ic.ToString();

Also getting the int from a boxed version of a Nullable<int> is straightforward - Usually you can't unbox to a type other than the original src type.

float f = 1.5f;
object boxed_float = f;
int int_value = (int) boxed_float; // will blow up. Cannot unbox a float to an int, you *must* unbox to a float first.

float? nullableFloat = 1.4f;
boxed_float = nullableFloat;
float fValue = (float) boxed_float;  // can unbox a float? to a float    Console.WriteLine(fValue);

Here you do not have to know if the original version was an int or a Nullable version of it. (+ you get some perf too ; save space of storing the the hasValue boolean as well in the boxed object)

Gishu
  • 134,492
  • 47
  • 225
  • 308
  • 1
    I can understand this as a rationale, but I would think it would be better to have either used some CLR magic to create wrapper methods for a Nullable to implement T's interfaces, or else restricted the odd boxing behavior to interface casts. As it is, Nullable ends up in a really weird limbo. – supercat Sep 12 '12 at 23:31
0

I guess that is basically what it does. The description given includes your suggestion (ie boxing into a Nullable<T>).

The extra is that it sets the hasValue field after boxing.

André Chalella
  • 13,788
  • 10
  • 54
  • 62
0

I would posit that the reason for the behavior stems from the behavior of Object.Equals, most notably the fact that if the first object is null and the second object is not, Object.Equals returns false rather than call the Equals method on the second object.

If Object.Equals would have called the Equals method on the second object in the case where the first object was null but the second was not, then an object which was null-valued Nullable<T> could have returned True when compared to null. Personally, I think the proper remedy would have been to make the HasValue property of a Nullable<T> have nothing to do with the concept of a null reference. With regard to the overhead involved with storing a boolean flag on the heap, one could have provided that for every type Nullable<T> there would a be a static boxed empty version, and then provide that unboxing the static boxed empty copy would yield an empty Nullable<T>, and unboxing any other instance would yield a populated one.

supercat
  • 77,689
  • 9
  • 166
  • 211