I have the following simple unit test:
- Create base.dll assembly in memory - get its byte array.
- Create main.dll assembly depending on base.dll in memory - get its byte array.
- Create
CSharpCompilation
object from both dlls
I will post the complete unit test at the end, but for now here is the relevant fragment:
const string BASE_CODE = "public interface I {}";
const string MAIN_CODE = "public class T: I {}";
MetadataReference systemDllRef = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var baseDllBytes = GetDllBytes("base", BASE_CODE);
MetadataReference baseDllRef = MetadataReference.CreateFromStream(new MemoryStream(baseDllBytes), filePath: "base.dll");
var mainDllBytes = GetDllBytes("main", MAIN_CODE, systemDllRef, baseDllRef);
MetadataReference mainDllRef = MetadataReference.CreateFromStream(new MemoryStream(mainDllBytes), filePath: "main.dll");
var compilation = CSharpCompilation.Create("temp", null, new[] { systemDllRef, baseDllRef, mainDllRef });
As you can see main.dll defines a single type T
which implements interface I
defined in base.dll.
Next I would like to obtain the type symbol for T
and answer the following questions:
- What assembly owns it?
- What assembly owns its interface?
- Does it have any declaring syntax references?
- Does its interface have any declaring syntax references?
Here is the code:
var mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("main", mainTypeSymbol.ContainingAssembly.Name); // GOOD
Assert.AreEqual("base", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // GOOD
CollectionAssert.IsEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // BAD !!!
CollectionAssert.IsEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // BAD !!!
Of course, it is expected that the declaring syntax references are empty, after all the compilation object contains no syntax trees at all. I am going to fix it now:
compilation = compilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(MAIN_CODE));
mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("temp", mainTypeSymbol.ContainingAssembly.Name); // BAD !!!
Assert.AreEqual("base", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // GOOD
CollectionAssert.IsNotEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // GOOD
CollectionAssert.IsEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // BAD !!!
IBYP? Now I have the declaring syntax reference for T
, but its assembly is reported as temp, not main !!! Now if I add the syntax tree for I
:
compilation = compilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(BASE_CODE));
mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("temp", mainTypeSymbol.ContainingAssembly.Name); // BAD !!!
Assert.AreEqual("temp", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // BAD !!!
CollectionAssert.IsNotEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // GOOD
CollectionAssert.IsNotEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // GOOD
All the assembly results are now botched, but the declaring syntax references are returned.
The complete unit test code is:
[Test]
public void SymbolAssembly()
{
const string BASE_CODE = "public interface I {}";
const string MAIN_CODE = "public class T: I {}";
MetadataReference systemDllRef = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var baseDllBytes = GetDllBytes("base", BASE_CODE);
MetadataReference baseDllRef = MetadataReference.CreateFromStream(new MemoryStream(baseDllBytes), filePath: "base.dll");
var mainDllBytes = GetDllBytes("main", MAIN_CODE, systemDllRef, baseDllRef);
MetadataReference mainDllRef = MetadataReference.CreateFromStream(new MemoryStream(mainDllBytes), filePath: "main.dll");
var compilation = CSharpCompilation.Create("temp", null, new[] { systemDllRef, baseDllRef, mainDllRef });
var mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("main", mainTypeSymbol.ContainingAssembly.Name); // GOOD
Assert.AreEqual("base", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // GOOD
CollectionAssert.IsEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // BAD !!!
CollectionAssert.IsEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // BAD !!!
compilation = compilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(MAIN_CODE));
mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("temp", mainTypeSymbol.ContainingAssembly.Name); // BAD !!!
Assert.AreEqual("base", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // GOOD
CollectionAssert.IsNotEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // GOOD
CollectionAssert.IsEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // BAD !!!
compilation = compilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(BASE_CODE));
mainTypeSymbol = compilation.GetTypeByMetadataName("T");
Assert.AreEqual("temp", mainTypeSymbol.ContainingAssembly.Name); // BAD !!!
Assert.AreEqual("temp", mainTypeSymbol.Interfaces[0].ContainingAssembly.Name); // BAD !!!
CollectionAssert.IsNotEmpty(mainTypeSymbol.DeclaringSyntaxReferences); // GOOD
CollectionAssert.IsNotEmpty(mainTypeSymbol.Interfaces[0].DeclaringSyntaxReferences); // GOOD
}
private static byte[] GetDllBytes(string name, string code, params MetadataReference[] metadataReferences)
{
var syntaxTree = CSharpSyntaxTree.ParseText(code);
var c = CSharpCompilation.Create(name, new[] { syntaxTree }, metadataReferences, new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));
var stream = new MemoryStream();
var res = c.Emit(stream);
Assert.IsTrue(res.Success);
var bytes = stream.GetBuffer();
if (bytes.Length > stream.Position)
{
bytes = new byte[stream.Position];
Array.Copy(stream.GetBuffer(), bytes, stream.Position);
}
return bytes;
}
I can explain the results like this:
- When there is no matching syntax tree:
- There is no declaring syntax references. Understandably so.
- The
ISymbol.ContainingAssembly
property returns the actual assembly represented by the respectiveMetadataReference
object.
- When there is a matching syntax tree:
- There is the declaring syntax references. Makes sense too.
- For some reason, finding the matching syntax tree in the
Compilation
object changes the result of theISymbol.ContainingAssembly
property - it is now the name of theCompilation
object.
Now my question - how can I get both the containing assembly and the respective declaring syntax references from a Compilation
object containing all the right MetadataReference
and SyntaxTree
objects?
Rationale
We are in the process of decomposing our monolithic application. This includes a lot of "dumb" refactoring. By "dumb" I mean those that can be reasonably automated. For example, suppose there are two Dependency Injected interfaces that are used very frequently and I want to move a method from one to another. There is a lot of similar changes to be done in all the places where the moved method is used. 95% of them can be automated and so I wrote a tool that does it. But instead of trying to guess all the places where the code must be adjusted it compiles the code and then resolves the build errors automatically. Maybe this is a wrong approach, but that is what I am currently doing:
- I map all the types in the code across all the solutions (we have many and refactoring is across all of them) including the source file paths and the types that are using and are used by the type in question. This is a preliminary operation before refactoring starts. It is quite smart as it knows to deal with "good" dynamic calls and known constants. The generated map (~100MB in size) is used subsequently.
- The code moves the method (it so happens there are very little dependencies to be moved in this particular case)
- The code starts build-fix loop, where each error is parsed and the code is fixed accordingly.
The fix involves creating a Compilation
object from all the relevant MetadataReference
objects and adding SyntaxTree
objects as deemed necessary for fixing the error. Right now the name of the Compilation
object matches the name of the assembly being built and so as long as the fix is limited to that same assembly all is working well. But, if in order to fix project X I need to go back and update project Y it means the Compilation
object now has SyntaxTree
objects both from X and Y and that is no good, because it changes the ContainingAssembly
property. So, right now I only have one Compilation
object per error fixing session, but it seems I cannot use this model anymore.
Maybe this is all a stupid idea, but it does work nicely and produces good results, again, as long as I do not have to reach back to other projects while fixing an error in the current project. The build-fix loop allows for manual intervention if it is unable to fix the code (because it does not know how) and it is capable to do about 95% of the changes automatically.
Clarification 1
When a compilation error occurs I create Compilation
object with the following pieces:
- The DLL (i.e.
MetadataReference
created from it) from the last successful build of the project. - All the DLLs referenced by the DLL from (1)
- The syntax tree of the file mentioned in an error.
Then I start working from there adding syntax trees as needed. So I never add all the syntax trees for all the source files. Only a few as needed and sometimes some of them would correspond to symbols from dependency projects. This is how it happens that in order to fix an error in the project associated with that error I need to go back and change something in a source file owned by some dependency project. During this process some syntax trees from that other project are added to the Compilation
object and this is how I end up with syntax trees from different projects in the same Compilation
object.