3

For a project to add mixins to C# using code weaving, I am cloning code from a source mixin type's parameterless instance constructor to constructors in a target type. For the purposes of this, I divide a constructor into three conceptual parts, and this is what I am asking for help with.

Here are the three parts:

  1. Field initialization that runs before the base or chained constructor call.
  2. Base or chained constructor call, including loading of arguments onto the stack.
  3. Actual constructor code compiled from source code written in the constructor body.

The basic idea is to multiplex the source constructor into these pieces. The multiplexing step also involved checking local variables (stloc* and ldloc*), so it's important that the instruction separation is correct. Those target constructors that call into base constructors are the code cloning targets. Each one will have the source's section 1 cloned into its section 1 and will have a method call added to its section 3 which will invoke a new method that contains the source constructor's section 3 code within the target type. (It's put into its own method primarily because of the possibility of multiple exit points.)

I've read through the C# spec's instance constructor section, but other than confirming the intentional existence of the 3 sections that I'm seeing, I don't find it helpful. I've had a couple of promising false starts on this, and rather than try yet another bad strategy that passes my test cases and then chokes as soon as it hits something I didn't think of, I'm hoping that I can get some better input from somebody with better experience.

My current "next" thought is to cycle through instructions looking for ldarg.0, and then to detect the next method call. If that next method call is a base or chained constructor, then I can call this Section 2, with instructions before as Section 1 and instructions after as Section 3. I'm concerned, though, that the instructions may not always have such a clean separation, and I'm not sure how I could be certain of such a thing.

Another thought is that because the spec specifically states that variable initialization instructions come before the base or chained constructor call, it might be more reliable to look for the end of instructions that set local fields. Unfortunately, I'm not certain what would be the best way to go about that.

Here's an example of a target type and the conceptual breakdown that I'm looking for of the constructors.

public class MultipleConstructorsTarget : MultipleConstructorsTargetBase
{
    public MultipleConstructorsTarget()
    {
        var values = Tuple.Create(783535, "KNion wineofn oianweiof nqiognui ndf", new UriBuilder { Host = "j.k.l" });

        this.OriginalUninitializedInt = values.Item1;
        this.OriginalUninitializedString = values.Item2;
        this.OriginalUninitializedObject =  values.Item3;
    }

    public MultipleConstructorsTarget(int i) : this(i, "A iuohiogfniouhe uihui iu.", new UriBuilder { Host = "g.h.i" }) { }

    public MultipleConstructorsTarget(int i, string j) : this(i, j, new UriBuilder { Host = "d.e.f" }) { }

    public MultipleConstructorsTarget(int i, string j, UriBuilder k)
        : base(i)
    {
        this.OriginalUninitializedInt = i;
        this.OriginalUninitializedString = j;
        this.OriginalUninitializedObject = k;
    }

    public int OriginalInitializedInt = 48685;
    public string OriginalInitializedString = "Tion3lao ehiuawh iuh buib ld";
    public UriBuilder OriginalInitializedObject = new UriBuilder { Host = "a.b.c" };

    public int OriginalUninitializedInt;
    public string OriginalUninitializedString;
    public UriBuilder OriginalUninitializedObject;
}

For MultipleConstructorsTarget()

Section 1

  IL_0000:  ldarg.0
  IL_0001:  ldc.i4     0xbe2d
  IL_0006:  stfld      int32 Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedInt
  IL_000b:  ldarg.0
  IL_000c:  ldstr      "Tion3lao ehiuawh iuh buib ld"
  IL_0011:  stfld      string Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedString
  IL_0016:  ldarg.0
  IL_0017:  newobj     instance void [System]System.UriBuilder::.ctor()
  IL_001c:  stloc.2
  IL_001d:  ldloc.2
  IL_001e:  ldstr      "a.b.c"
  IL_0023:  callvirt   instance void [System]System.UriBuilder::set_Host(string)
  IL_0028:  ldloc.2
  IL_0029:  stfld      class [System]System.UriBuilder Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedObject

Section 2

  IL_002e:  ldarg.0
  IL_002f:  call       instance void Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTargetBase::.ctor()

Section 3

  IL_0034:  ldc.i4     0xbf4af
  IL_0039:  ldstr      "KNion wineofn oianweiof nqiognui ndf"
  IL_003e:  newobj     instance void [System]System.UriBuilder::.ctor()
  IL_0043:  stloc.1
  IL_0044:  ldloc.1
  IL_0045:  ldstr      "j.k.l"
  IL_004a:  callvirt   instance void [System]System.UriBuilder::set_Host(string)
  IL_004f:  ldloc.1
  IL_0050:  call       class [mscorlib]System.Tuple`3<!!0,!!1,!!2> [mscorlib]System.Tuple::Create<int32,string,class [System]System.UriBuilder>(!!0, !!1, !!2)
  IL_0055:  stloc.0
  IL_0056:  ldarg.0
  IL_0057:  ldloc.0
  IL_0058:  callvirt   instance !0 class [mscorlib]System.Tuple`3<int32,string,class [System]System.UriBuilder>::get_Item1()
  IL_005d:  stfld      int32 Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedInt
  IL_0062:  ldarg.0
  IL_0063:  ldloc.0
  IL_0064:  callvirt   instance !1 class [mscorlib]System.Tuple`3<int32,string,class [System]System.UriBuilder>::get_Item2()
  IL_0069:  stfld      string Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedString
  IL_006e:  ldarg.0
  IL_006f:  ldloc.0
  IL_0070:  callvirt   instance !2 class [mscorlib]System.Tuple`3<int32,string,class [System]System.UriBuilder>::get_Item3()
  IL_0075:  stfld      class [System]System.UriBuilder Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedObject
  IL_007a:  ret

For MultipleConstructorsTarget(int i)

Section 1
(empty)

Section 2

  IL_0000:  ldarg.0
  IL_0001:  ldarg.1
  IL_0002:  ldstr      "A iuohiogfniouhe uihui iu."
  IL_0007:  newobj     instance void [System]System.UriBuilder::.ctor()
  IL_000c:  stloc.0
  IL_000d:  ldloc.0
  IL_000e:  ldstr      "g.h.i"
  IL_0013:  callvirt   instance void [System]System.UriBuilder::set_Host(string)
  IL_0018:  ldloc.0
  IL_0019:  call       instance void Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::.ctor(int32, string, class [System]System.UriBuilder)

Section 3

  IL_001e:  ret

For MultipleConstructorsTarget(int i, string j)

Section 1
(empty)

Section 2

  IL_0000:  ldarg.0
  IL_0001:  ldarg.1
  IL_0002:  ldarg.2
  IL_0003:  newobj     instance void [System]System.UriBuilder::.ctor()
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldstr      "d.e.f"
  IL_000f:  callvirt   instance void [System]System.UriBuilder::set_Host(string)
  IL_0014:  ldloc.0
  IL_0015:  call       instance void Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::.ctor(int32, string, class [System]System.UriBuilder)

Section 3

  IL_001a:  ret

For MultipleConstructorsTarget(int i, string j, UriBuilder k)

Section 1

  IL_0000:  ldarg.0
  IL_0001:  ldc.i4     0xbe2d
  IL_0006:  stfld      int32 Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedInt
  IL_000b:  ldarg.0
  IL_000c:  ldstr      "Tion3lao ehiuawh iuh buib ld"
  IL_0011:  stfld      string Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedString
  IL_0016:  ldarg.0
  IL_0017:  newobj     instance void [System]System.UriBuilder::.ctor()
  IL_001c:  stloc.0
  IL_001d:  ldloc.0
  IL_001e:  ldstr      "a.b.c"
  IL_0023:  callvirt   instance void [System]System.UriBuilder::set_Host(string)
  IL_0028:  ldloc.0
  IL_0029:  stfld      class [System]System.UriBuilder Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalInitializedObject

Section 2

  IL_002e:  ldarg.0
  IL_002f:  ldarg.1
  IL_0030:  call       instance void Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTargetBase::.ctor(int32)

Section 3

  IL_0035:  ldarg.0
  IL_0036:  ldarg.1
  IL_0037:  stfld      int32 Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedInt
  IL_003c:  ldarg.0
  IL_003d:  ldarg.2
  IL_003e:  stfld      string Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedString
  IL_0043:  ldarg.0
  IL_0044:  ldarg.3
  IL_0045:  stfld      class [System]System.UriBuilder Bix.Mixers.Fody.TestMixinTargets.MultipleConstructorsTarget::OriginalUninitializedObject
  IL_004a:  ret

I'm using Mono.Cecil for all of my IL reading and writing. You can find the Bix.Mixers project code at https://github.com/rileywhite/Bix.Mixers.Fody if you are interested. The specific file that this question is in regards to is at https://github.com/rileywhite/Bix.Mixers.Fody/blob/master/src/Bix.Mixers/Fody/ILCloning/ConstructorMultiplexer.cs.

rileywhite
  • 365
  • 2
  • 12
  • 1
    Why is creating the `UriBuilder` in section 2 for `MultipleConstructorsTarget()`, but in section 1 for `MultipleConstructorsTarget(int i)`? – svick Oct 30 '14 at 09:37
  • Oops! Mistake. It's fixed now. Thanks for catching that @svick :-) – rileywhite Oct 30 '14 at 10:17
  • 1
    Your sections still don't make much sense to me. In `MultipleConstructorsTarget(int i)`, why is loading `this` (`ldarg.0`) and the first two parameters (`ldarg.1`, `ldstr`) of the chained constructor call in section 1, while loading the third parameter (`ldloc.0`) is in section 2? I think that logically, all that code should be in section 2. – svick Oct 30 '14 at 13:51
  • You are right, @svick. I think you can see why I'm asking for fresh eyes on this :-/ – rileywhite Oct 30 '14 at 14:20
  • 1
    Beware that other code that also uses IL rewriting, notably including Microsoft's own Code Contracts, may cause this to become a completely unworkable approach. Contract validation takes place before the base constructor call, and there are no measures in place to prevent contracts from accessing and even freely modifying and exposing the current object instance. –  Nov 01 '14 at 20:03
  • That's a valid concern, @hvd, and I appreciate that you're raising it. The rewriting code itself makes heavy use of contracts, and Fody, which invokes the rewriting in VS as part of the build step, does not conflict. I haven't yet tested how my specific weaving interacts with code contract rewriting. I can say, however, that the Require section of code contracts don't allow use of `this`, and that's only section that I'm aware of that runs before the base constructor call. It could be a problem, still, but so far, so good. *fingers crossed* – rileywhite Nov 01 '14 at 21:50
  • 1
    Thank you for the information that Code Contracts does attempt to prevent this. I know for a fact that I have had a working sample at one time in which `this` did get exposed, but I indeed cannot manage to do so in the current version, the best I can manage is to get code that Code Contracts rewrites to something that fails at run-time with an exception, so the problem I was referring to does seem to be fixed already. –  Nov 01 '14 at 22:27
  • Let's hope it doesn't get reintroduced! Thanks for checking into that so thoroughly :-) If you happen to come across your previous sample, and if it indeed still does manage to expose `this`, then I hope you'll come back and share it. – rileywhite Nov 01 '14 at 22:29

1 Answers1

0

A strategy that seems to work is to enumerate through the constructor instructions, grouping them together as follows:

  1. If an instruction is not ldarg.0, then it is put into a group by itself.
  2. If an instruction is ldarg.0, then all following instructions are grouped with it until one of the following conditions is met:
    • A call, virtcall, or calli instructions is encountered.
    • An instruction is the last instruction before another ldarg.0.
    • All instructions have been enumerated.

Once a group has been identified, the last instruction in the group is examined. If it is a call instruction and the operand is a base or chained constructor, then the group is identified as Section 2, meaning that preceding instructions are Section 1 and remaining instructions are Section 3.

Here's the code that, given the index at which to start looking, identifies a group of instructions based on these rules.

public static bool TryGetNext(IList<Instruction> sourceInstructions, int firstIndex, out InstructionGroup instructionGroup)
{
    Contract.Requires(sourceInstructions != null);

    if (firstIndex < 0 || firstIndex >= sourceInstructions.Count)
    {
        instructionGroup = null;
        return false;
    }

    var instructions = new List<Instruction>();
    var instruction = sourceInstructions[firstIndex];
    instructions.Add(instruction);


    int lastIndex;
    if (instruction.OpCode.Code != Code.Ldarg_0) { lastIndex = firstIndex; }
    else
    {
        int i;
        // calls into base and chained constructors start like this
        // so we'll look for the next call instruction or any instruction where the next instruction is another ldarg.0
        // meaning that the stack was cleared at some point
        // there is no assumption that this grouping is generally useful, but the hope is that it will catch constructor calls in this specific case
        var isLastInstructionFound = false;
        for (i = firstIndex + 1; !isLastInstructionFound && i < sourceInstructions.Count; i++)
        {
            instruction = sourceInstructions[i];
            instructions.Add(instruction);
            if (instruction.OpCode.Code == Code.Call ||
                instruction.OpCode.Code == Code.Callvirt ||
                instruction.OpCode.Code == Code.Calli ||
                (instruction.Next != null && instruction.Next.OpCode.Code == Code.Ldarg_0))
            {
                isLastInstructionFound = true;
            }
        }

        lastIndex = i - 1;
    }

    instructionGroup = new InstructionGroup(firstIndex, lastIndex, instructions);
    return true;
}

If you're interested, you can see the full code at https://github.com/rileywhite/Bix.Mixers.Fody/blob/0.1.7/src/Bix.Mixers/Fody/ILCloning/ConstructorMultiplexer.cs.

Even though this seems to work, I will not select this as the final answer because it's still just another heuristic. I'd love to get a real answer from someone with more experience.

rileywhite
  • 365
  • 2
  • 12