Performance: type derived from generic

Question

I've encountered with one performance problem that I can't quite understand. I know how to fix it but I don't understand Why that happens. It's just for fun!
Let's talk code. I simplified the code as much as I could to reproduce the issue.
Suppose we have a generic class. It has an empty list inside and does something with T in constructor. It has Run method that calls an IEnumerable<T> method on the list, e.g. Any().

public class BaseClass<T>
{
    private List<T> _list = new List<T>();

    public BaseClass()
    {
        Enumerable.Empty<T>();
        // or Enumerable.Repeat(new T(), 10);
        // or even new T();
        // or foreach (var item in _list) {}
    }

    public void Run()
    {
        for (var i = 0; i < 8000000; i++)
        {
            if (_list.Any())
            // or if (_list.Count() > 0)
            // or if (_list.FirstOrDefault() != null)
            // or if (_list.SingleOrDefault() != null)
            // or other IEnumerable<T> method
            {
                return;
            }
        }
    }
}

Then we have a derived class which is empty:

public class DerivedClass : BaseClass<object>
{
}

Let's measure the performance of running ClassBase<T>.Run method from both classes. Accessing from derived type is 4 times slower that from base class. And I can't understand why that happens. Compiled in Release mode, result is the same with warm up. It happens on .NET 4.5 only.

public class Program
{
    public static void Main()
    {
        Measure(new DerivedClass());
        Measure(new BaseClass<object>());
    }

    private static void Measure(BaseClass<object> baseClass)
    {
        var sw = Stopwatch.StartNew();
        baseClass.Run();
        sw.Stop();
        Console.WriteLine(sw.ElapsedMilliseconds);
    }
}

Full listing on gist

First - are you running this in Debug or Release build? Second - you should do a single run on both instances first before running your performance test, to remove the variability of the jitter. — Gjeltema, Nov 27 '14 at 17:37
Guys, have you tried to run the code? It's in Release mode, result is the same with warm up. Even if you interchange the lines. — Alexandr Nikitin, Nov 27 '14 at 17:43
Interesting! The time seems to be spent inside the call to `Any()` and goes away if you make the list `List`... — DavidG, Nov 27 '14 at 17:50
@DavidG yes, that's true. But why does it differ for `T` and derived type? Assembler time? Or noooooooёёё! — Alexandr Nikitin, Nov 27 '14 at 18:00
Oddly, it's also quicker if you simply remove the code from the constructor. — DavidG, Nov 27 '14 at 18:01
Both classes perform the same if you remove `Enumerable.Empty` from the constructor. I don't know why, but ... — MarcinJuraszek, Nov 27 '14 at 18:22
@MarcinJuraszek Maybe because `Enumerable.Empty` lazily allocates an array? but that has to behave different when you swap the order.. — Sriram Sakthivel, Nov 27 '14 at 18:25
Constructor is called outside of `Measure` method anyway. It shouldn't matter at all, should it? — MarcinJuraszek, Nov 27 '14 at 18:26
@MarcinJuraszek Nope, but I was testing several combinations locally, including the constructor. — Sriram Sakthivel, Nov 27 '14 at 18:28
It's a useless test anyway. Sure, if you remove Enumerable.Empty or _list.Any() they run at the same speed. But calling Enumerable.Empty in a constructor without using it is useless anyway. If I change the constructor to _list = Enumerable.Empty() they also run at the same speed, and this would actualy be a meaningfull use of Enumerable.Empty. Even then, running 8000000 iterations in the Run method, it takes 0.0000003125 seconds per iteration. Conclusion: nice find, I can't explain it, and you will never notice it in a real application. — David Libido, Nov 27 '14 at 21:34
@DavidLibido And you're wrong here. We noticed that in very real application. The real code is way more complex and implements Aho Corasick algorithm. and I'm not asking about business value of the test code. I'm asking why it happens?! — Alexandr Nikitin, Nov 27 '14 at 22:13
You're calling Enumerable.Empty in a real application in a constructor without actually using it? Because everyting is fine when I test it and I actually assign it to the _list in the constructor. — David Libido, Nov 27 '14 at 22:17
@DavidLibido Ah yeah, forgot to remove that line, now it flies. Seriously! I updated the example to keep it more clear. I have foreach through IEnumerable in ctr. — Alexandr Nikitin, Nov 27 '14 at 22:45
Remove the sentence, that it's for fun. It is not funny, it is a performance/design issue we need to understand and avoid in our case. — Viktor Jevdokimov, Nov 28 '14 at 06:18
While class looks pretty thread safe, running test on multiple threads with singleton instance of the class, derived one degrades per thread, while direct one keeps the same amount of ops per thread (# of threads <= number of cores). — Viktor Jevdokimov, Nov 28 '14 at 06:30
Even looking at the generated IL and comparing the fast version (empty constructor) to the slow version (with `Enumerable.Any()` in constructor) there is zero difference other than the `call` and a `pop`, though both are only called once. Makes no sense! — DavidG, Nov 28 '14 at 09:58
If indeed (as some of the answers below suggest) you are running into a bug in the JIT, then Microsoft Connect is the place to open the bug. You will not get an answer on StackOverflow telling you why there is a bug or what exactly the shape of the bug is. If you believe there is a code generation issue, open a bug on Connect. — Sasha Goldshtein, Nov 28 '14 at 14:17
Submitted a bug to Microsoft Connect https://connect.microsoft.com/VisualStudio/feedback/details/1041830/performance-type-derived-from-generic — Alexandr Nikitin, Nov 28 '14 at 17:25
Turns out this has nothing to do with the constructor or with Enumerable methods: http://pastebin.com/yNChbc2p — usr, Nov 30 '14 at 14:16
One call to `Measure` is being inlined, the other one not. Disabling inlining makes the problem go away. — usr, Nov 30 '14 at 14:29
@usr Yes, it inlines call to `Measure(DerivedClass)` but doesn't for `Measure(BaseClass)`. Both call `Run` which is JITed just once and has one asm code for both calls. And yes! setting `[MethodImpl(MethodImplOptions.AggressiveInlining)]` attribute to `Run()` helps. But why?? Could you expand you comment please. — Alexandr Nikitin, Nov 30 '14 at 22:13
@Alexandr Nikitin: Inlining just hides the bug. It behaves like if il-code is copied into `Measure()` method and then entire `Measure()` jits. It breaks something that would lead to perfomance degradation so now all looks good. Bad part is, you still have a chance to get a performance hit in a real code. — Sinix, Dec 01 '14 at 07:02
@Usr. Agreed. Here's simplified sample: http://pastebin.com/sUdG6KDK (code is not mine, grats to "vorona" user from rsdn.ru). Can be fixed by uncommenting both `Test1()` and `Test2()` methods -or- by uncommenthing the `Foo.Bar3();` call at line 16. — Sinix, Dec 01 '14 at 07:11
I raised an issue on github https://github.com/dotnet/coreclr/issues/55 — Alexandr Nikitin, Feb 04 '15 at 08:17
Great explanation from Chris McKinsey on https://github.com/dotnet/coreclr/issues/55#issuecomment-89026823 — Alexandr Nikitin, Apr 04 '15 at 10:23

score 31 · Accepted Answer · edited Jun 20 '20 at 09:12

Update:
There's an answer from the CLR team on Microsoft Connect

It is related to dictionary lookups in shared generics code. The heuristic in runtime and JIT do not work well for this particular test. We will take a look what can be done about it.

In the meantime, you can workaround it by adding two dummy methods to the BaseClass (do not even need to be called). It will cause the heuristic to work as one would expect.

Original:
That's JIT fail.

Can be fixed by this crazy thing:

    public class BaseClass<T>
    {
        private List<T> _list = new List<T>();

        public BaseClass()
        {
            Enumerable.Empty<T>();
            // or Enumerable.Repeat(new T(), 10);
            // or even new T();
            // or foreach (var item in _list) {}
        }

        public void Run()
        {
            for (var i = 0; i < 8000000; i++)
            {
                if (_list.Any())
                {
                    return;
                }
            }
        }

        public void Run2()
        {
            for (var i = 0; i < 8000000; i++)
            {
                if (_list.Any())
                {
                    return;
                }
            }
        }

        public void Run3()
        {
            for (var i = 0; i < 8000000; i++)
            {
                if (_list.Any())
                {
                    return;
                }
            }
        }
    }

Note that Run2()/Run3() are not called from anywhere. But if you comment out Run2 or Run3 methods - you'll get performance penalty as before.

There's something related to stack alignment or to the size of method table, I guess.

P.S. You can replace

 Enumerable.Empty<T>();
 // with
 var x = new Func<IEnumerable<T>>(Enumerable.Empty<T>);

still the same bug.

While this doesn't answer the question, it does lead down the path suggesting a bug somewhere. Interestingly, it also doesn't matter if `Run2` and `Run3` contain any code. — DavidG, Nov 28 '14 at 11:42
One more time, I'm not asking `how` to fix it. I know how to do that. I'm asking `why` it happens. — Alexandr Nikitin, Nov 28 '14 at 12:29

score 0 · Answer 2 · answered Nov 28 '14 at 09:30

After some experimentation, I found that Enumerable.Empty<T> is always slow when T is a class type; if it is a value type it is faster, but dependent on the struct size. I tested object, string, int, PointF, RectangleF, DateTime, Guid.

Looking at how it is implemented, I tried different alternatives, and found some that works fast.

Enumerable.Empty<T> relies on the internal class EmptyEnumerable<TElement>'s Instance static property.

That property does little things:

Checks if a private static volatile field is null.
Assigns an empty array to the field once (only if empty).
Returns the value of the field.

Then, what Enumerable.Empty<T> is really doing is only return an empty array of T.

Trying different approaches, I found that the slowness is caused by both the property and the volatile modifier.

Adopting a static field initialized to T[0] instead of Enumerable.Empty<T> like

public static readonly T[] EmptyArray = new T[0];

the problem disappeared. Note that the readonly modifier is not determinant. Having the same static field declared with volatile or accessed through a property causes the problem.

Regards, Daniele.

What about `new T();` instead of `Enumerable.Empty` in `BaseClass` ctr? Also slow! — Alexandr Nikitin, Nov 28 '14 at 09:38
No, it is fast: having direct empty array instantiation within BaseClass constructor it is (almost) as fast as using the EmptyArray field. — Rubidium 37, Nov 28 '14 at 09:42
I meant this version: https://gist.github.com/alexandrnikitin/b50276137d6529650165 — Alexandr Nikitin, Nov 28 '14 at 10:32

score -2 · Answer 3 · answered Nov 28 '14 at 09:17

-2

It seems there is an CLR optimizator issue. Turn off the "Optimize Code" on the Build tab and try to run your test again.

answered Nov 28 '14 at 09:17

Eugene Bushtyrev

1
1

2

And the result is the same! – Alexandr Nikitin Nov 28 '14 at 09:39

Performance: type derived from generic

3 Answers3

That property does little things: