How are the "primitive" types defined non-recursively?

Question

Since a struct in C# consists of the bits of its members, you cannot have a value type T which includes any T fields:

// Struct member 'T.m_field' of type 'T' causes a cycle in the struct layout
struct T { T m_field; }

My understanding is that an instance of the above type could never be instantiated*—any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?^**)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense; perhaps it's a self-defeating entity, sort of like "This statement is false."

Curiously, though, if you run this code:

BindingFlags privateInstance = BindingFlags.NonPublic | BindingFlags.Instance;

// Give me all the private instance fields of the int type.
FieldInfo[] int32Fields = typeof(int).GetFields(privateInstance);

foreach (FieldInfo field in int32Fields)
{
    Console.WriteLine("{0} ({1})", field.Name, field.FieldType);
}

...you will get the following output:

m_value (System.Int32)

It seems we are being "lied" to here***. Obviously I understand that the primitive types like int, double, etc. must be defined in some special way deep down in the bowels of C# (you cannot define every possible unit within a system in terms of that system... can you?—different topic, regardless!); I'm just interested to know what's going on here.

How does the System.Int32 type (for example) actually account for the storage of a 32-bit integer? More generally, how can a value type (as a definition of a kind of value) include a field whose type is itself? It just seems like turtles all the way down.

Black magic?

_{*On a separate note: is this the right word for a value type ("instantiated")? I feel like it carries "reference-like" connotations; but maybe that's just me. Also, I feel like I may have asked this question before—if so, I forget what people answered.}

_{**Both Martin v. Löwis and Eric Lippert have pointed out that this is neither entirely accurate nor an appropriate perspective on the issue. See their answers for more info.}

_{***OK, I realize nobody's actually lying. I didn't mean to imply that I thought this was false; my suspicion had been that it was somehow an oversimplification. After coming to understand (I think) thecoop's answer, it makes a lot more sense to me.}

@djacobson - looks like your wand works. Can I borrow it? I have a couple of things I would like to summon, and they are not Eric Lippert... — Oded, Jan 20 '11 at 22:33
To understand recursion, you must first understand recursion. — Jwosty, Aug 08 '15 at 19:14

thecoop · Accepted Answer · 2011-01-21T00:26:41.123

11

As far as I know, within a field signature that is stored in an assembly, there are certain hardcoded byte patterns representing the 'core' primitive types - the signed/unsigned integers, and floats (as well as strings, which are reference types and a special case). The CLR knows natively how to deal with those. Check out Partition II, section 23.2.12 of the CLR spec for the bit patterns of the signatures.

Within each primitive struct ([mscorlib]System.Int32, [mscorlib]System.Single etc) in the BCL is a single field of that native type, and because a struct is exactly the same size as its constituent fields, each primitive struct is the same bit pattern as its native type in memory, and so can be interpreted as either, by the CLR, C# compiler, or libraries using those types.

From C#, int, double etc are synonyms of the mscorlib structs, which each have their primitive field of a type that is natively recognised by the CLR.

(There's an extra complication here, in that the CLR spec specifies that any types that have a 'short form' (the native CLR types) always have to be encoded as that short form (int32), rather than valuetype [mscorlib]System.Int32. So the C# compiler knows about the primitive types as well, but I'm not sure of the exact semantics and special-casing that goes on in the C# compiler and CLR for, say, method calls on primitive structs)

So, due to Godel's Incompleteness Theorem, there has to be something 'outside' the system by which it can be defined. This is the Magic that lets the CLR interpret 4 bytes as a native int32 or an instance of [mscorlib]System.Int32, which is aliased from C#.

edited Jan 21 '11 at 00:26

answered Jan 20 '11 at 20:01

thecoop

45,220
19
132
189

So basically you're saying that, e.g., the `System.Int32` type consists of a non-standard *field* whose (native) type lies outside the BCL; but this field is represented in the corresponding `Type` object as being of type `Int32` (even though it's not). Is that accurate? – Dan Tao Jan 20 '11 at 20:04
Sort of. Its not a non-standard field, its just a normal `int32`. The CLR knows to interpret fields of that type as a 32-bit int. The `[mscorlib]System.Int32` struct is binary-compatible with a native `int32` due to how structs work. Essentially, it's Magic that stops the recursion. – thecoop Jan 20 '11 at 20:09
I think I get what you're saying. In other words, based on the actual *bits* making up the field, an `Int32` (from mscorlib) is indistinguishable from a native 32-bit integer. So an `Int32` value can contain a field that is itself an `Int32` in terms of its bits; but there is no "recursion" because the CLR recognizes this as bits representing a native type. Please correct me if I'm still getting something wrong here as I want to be sure I have an accurate understanding of what the heck I'm saying! – Dan Tao Jan 20 '11 at 20:15
@Andras: this is very much AFAIK (and understand), I'm not a CLR dev! It would need someone who's worked near the code to confirm or refute the semantics as I understand them here. – thecoop Jan 20 '11 at 20:25
@thecoop: In that case, I *think* I get it ;) Thanks, this answer was quite helpful. I also really appreciate that you dragged Godel's Incompleteness Theorem into this! – Dan Tao Jan 20 '11 at 21:34

score 7 · Answer 2 · answered Jan 20 '11 at 22:27

7

My understanding is that an instance of the above type could never be instantiated any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense;

That's not the best way of characterizing the situation. A better way to look at it is that the size of every struct must be well-defined. An attempt to determine the size of T goes into an infinite loop, and therefore the size of T is not well-defined. Therefore, it's not a legal struct because every struct must have a well-defined size.

It seems we are being lied to here

There's no lie. An int is a struct that contains a field of type int. An int is of known size; it is by definition four bytes. Therefore it is a legal struct, because the size of all its fields is known.

How does the System.Int32 type (for example) actually store a 32-bit integer value

The type doesn't do anything. The type is just an abstract concept. The thing that does the storage is the CLR, and it does so by allocating four bytes of space on the heap, on the stack, or in registers. How else do you suppose a four-byte integer would be stored, if not in four bytes of memory?

how does the System.Type object referenced with typeof(int) present itself as though this value is itself an everyday instance field typed as System.Int32?

That's just an object, written in code like any other object. There's nothing special about it. You call methods on it, it returns more objects, just like every other object in the world. Why do you think there's something special about it?

answered Jan 20 '11 at 22:27

Eric Lippert

647,829
179
1,238
2,067

2

You do a good job of making this sound very straightforward; but, for me, your answer falls short of really clarifying the issue. Maybe (probably?) I'm just dense. You say: "An int is a struct that contains a field of type int"; to me this sounds circular, like saying, "An X is a box that contains an X." You can tack "...and an X also has a defined size" to the end of that statement, but it still feels circular and confusing (to me). I realize a type is a concept (poor word choice on my part); what confused me was that, in the case of `int`, it seemed this concept was used to define itself... – Dan Tao Jan 20 '11 at 23:36
1

...Personally, I found thecoop's answer extremely helpful because it made me realize that an object of the `System.Int32` type is identical (in bits) to a native 32-bit integer and thus can be treated as such. So rather than thinking of it as analogous to "An X is a box that contains an X," it can be thought of as "An X is a box that contains a Y, while also being physically identical to a Y." Does that sound right, or am I still not getting something? – Dan Tao Jan 20 '11 at 23:37
1

@Dan: There is no box around the data. The idea that a struct is anything *more* than its bits is I think where you were conceptually going wrong. The point of a value type is that an instance of the value type *is the value*, period, no more, no less. A reference type has all kinds of mung surrounding its data -- it's got sync fields and vtables and type discriminators and blah blah blah. A struct has none of that stuff. The "int" struct contains *an int*, and that's all. The *variable* that refers to the storage location might have other stuff, but *the value* is just *the value*. – Eric Lippert Jan 21 '11 at 00:02
Gah, shortly after posting that comment I *knew* you were going to correct my faulty "box" analogy. I **do** know that an instance of a value type is the value and no more—though I don't blame you for probably doubting me at this point. Again, what confused me (initially) was the *circularity* of it. Forget about boxes; it just seemed like saying "A chair is a chair" or "Red means red." – Dan Tao Jan 21 '11 at 00:18

Martin v. Löwis · Answer 3 · 2011-01-20T20:36:32.660

Three remarks, in addition to thecoop's answer:

Your assertion that recursive structs inherently couldn't work is not entirely correct. It's more like a statement "this statement is true": which is true if it is. It's plausible to have a type T whose only member is of type T: such an instance might consume 0 bytes, for example (since its only member consumes 0 bytes). Recursive value types only stop working if you have a second member (which is why they are disallowed).
Take a look at Mono's definition of Int32. As you can see: it actually is a type containing itself (since int is just an alias for Int32 in C#). There is certainly "black magic" involved (i.e. special-casing), as the comments explain: the runtime will lookup the field by name, and just expect that it's there - I also assume that the C# compiler will special-case the presence of int here.
In PE assemblies, type information is represented through "type signature blobs". These are sequences of type declarations, e.g. for method signatures, but also for fields. The list of available primitive types in such a signature is defined in section 22.1.15 of the CLR specification; a copy of the allowed values is in the CorElementType enumeration. Apparently, the reflection API maps these primitive types to their corresponding System.XYZ valuetypes.

can you elaborate on "Recursive value types only stop working if you have a second member"? struct T { T m_field; } definitely generates a compiler error — Robert Levy, Jan 20 '11 at 20:35
@Robert: right, it's ill-formed (in C#). However, there is no inherent reason for recursive valuetypes with a single member being ill-formed - it would be plausible (but useless) to allow the construct. It stops being plausible if you have a second data member (since it then would occupy an infinite amount of memory), hence it is disallowed. — Martin v. Löwis, Jan 20 '11 at 20:39

How are the "primitive" types defined non-recursively?

3 Answers3

Linked