2

Consider a simple Console .NET app, that only contains the Visual Studio's auto-generated and empty Main method.

As per Jeffrey Richter's "CLR via C#" in the context of mscorlib.dll: "This assembly is automatically loaded when the CLR initializes". Quite understandable, given that this assembly contains - among others - the core types.

Running the application and setting a filter to look for "mscorlib" file references only yields hits both for mscorlib.dll as well as mscorlib.ni.dll:

enter image description here

The "ni" stands for native image, and given that the mscorlib.dll assembly is so heavily used, it makes sense to make a native image out of it - mscorlib.ni.dll -, which the CLR will load instead of the "classical", mscorlib.dll one.

The fact that both files are being accessed is expected, since the CLR will only use the native image as long as it's still in sync with the original assembly it was generated from. If for example the source assembly changes, the native image is no longer used. Therefore both need to be checked before loading the native one. All clear so far.

The native images, as Microsoft states here, "are files containing compiled processor-specific machine code".

Assuming a native image only contains machine code, I went ahead and tried to open mscorlib.ni.dll with a disassembler (ILSpy), simply expecting the "PE file does not contain any managed metadata" error to be thrown. I was however surprised to see ILSpy correctly loading the assembly, complete with manifest, type members, as well as IL code displayed for methods:

enter image description here

I suspected that somehow a reference to mscorlib.dll is used, and the whole data is extracted from there actually. However Process Monitor filtered for ILSpy's image name proved me wrong: there's not even one attempt to open mscorlib.dll; there are only some attempts to get the .pdb file - most likely looking for symbols - that fail due to that file not existing.

In fact both assemblies - mscorlib.dll and mscorlib.ni.dll - are striking similar, with just the difference of IL_ONLY vs IL_LIBRARY in the Corflags structure.

Trying to load other native images from within the NIC (Native Image Cache) resulted in a similar behavior.

I could find advanced details about how a native image is still deemed valid (here), but no so much about my actual question:

How come native images can be disassembled down to the IL code level ? Is the IL as well as the machine code stored within the native image ?

Update 5/27/2019: It seems that Microsoft's definition is somewhat elusive. It should probably be interpreted as native images "are files containing compiled processor-specific machine code", aside the regular assembly metadata and IL code. The "native" particle inside the term leads one to the wrong meaning apparently.

The 2 good comments received lead to the next natural question: Where is the machine code actually stored inside the image file itself ?

In terms of details about the image file content, CFF Explorer showed a bit more than JetBrain dotPeek, and highlighted 2 extra sections, a rather large Relocation Directory entry and a mysterious Native Header in favor of the native image, but that was pretty much it.

enter image description here

The ECMA 335 standard enter link description here has but a few words about the physical layout of the image file concerning the code itself in section II.25.4. Aside from the fact that the method body immediately follows the method header, and that there's sometimes an extra method data section when dealing with exception handling, the standard doesn't really tell anything about the location of the machine code itself.

Mihai Albert
  • 1,288
  • 1
  • 12
  • 27
  • 1
    That looks accurate, easiest way to implement it. What is hard to see is the rather large amount of machine code that is embedded in the file. Other than by the file size, 5.5 MB for mscorlib, 20 MB for mscorlib.ni.dll. I don't know of any tool that can reveal it. – Hans Passant May 21 '19 at 20:50
  • 1
    .NET have reflection. That means you should have managed metadata for it to work. – user4003407 May 22 '19 at 00:17
  • @HansPassant I've added what I could find, but just like you hinted, I haven't got myself very far. – Mihai Albert May 27 '19 at 21:03
  • @PetSerAl: You're absolutely right. Assuming the native images would contain nothing but native code, there wouldn't have been any bit of assembly metadata, let alone IL that could be retrieved by doing reflection. I didn't really connect the dots at the time. – Mihai Albert May 27 '19 at 21:05

1 Answers1

0

If no native image is available for a particular method, NGEN falls back to JITing code. This means that native images must continue to include metadata and IL in the event that NGEN needs to fall back to JIT compilation.