13

I've noticed a performance hit of iterating over a primitive collection (T[]) that has been cast to a generic interface collection (IList or IEnumberable).

For example:

    private static int Sum(int[] array)
    {
        int sum = 0;

        foreach (int i in array)
            sum += i;

        return sum;
    }

The above code executes significantly faster than the code below, where the parameter is changed to type IList (or IEnumerable):

    private static int Sum(IList<int> array)
    {
        int sum = 0;

        foreach (int i in array)
            sum += i;

        return sum;
    }

The performance hit still occurs if the object passed is a primitive array, and if I try changing the loop to a for loop instead of a foreach loop.

I can get around the performance hit by coding it like such:

    private static int Sum(IList<int> array)
    {
        int sum = 0;

        if( array is int[] )
            foreach (int i in (int[])array)
                sum += i;
        else
            foreach (int i in array)
                sum += i;

        return sum;
    }

Is there a more elegant way of solving this issue? Thank you for your time.

Edit: My benchmark code:

    static void Main(string[] args)
    {
        int[] values = Enumerable.Range(0, 10000000).ToArray<int>();
        Stopwatch sw = new Stopwatch();

        sw.Start();
        Sum(values);
        //Sum((IList<int>)values);
        sw.Stop();

        Console.WriteLine("Elasped: {0} ms", sw.ElapsedMilliseconds);
        Console.Read();
    }
Generic Comrade
  • 327
  • 2
  • 7
  • 3
    How did you measure this? If you run the code repeatedly in a loop, the VM may dynamically optimize it. – Kerrek SB Nov 26 '11 at 17:57
  • I didn't run it in a virtual machine. I'll add my simple benchmark code to the question. – Generic Comrade Nov 26 '11 at 18:05
  • I would also consider the terniary for-loop and interate using the indexer. Interesting ... – IAbstract Nov 26 '11 at 18:05
  • 1
    How much is performance hit? Usually in practice, such perfomance dfference don't mean anything against real processing part. I'm sure in your real project, you don't do simple sum+=i, and any more advanced processing will take much longer then IList iteration, even it really takes some more time then int[] iteration. – Vladimir Perevalov Nov 26 '11 at 18:06
  • @IAbstract I mentioned that using a normal for loop didn't have any improvement on speed in my original question. – Generic Comrade Nov 26 '11 at 18:12
  • 3
    I checked the speed myself. It looks like, there is a perfomance hit, and it is dramatic 100%!!!! :) For 100000000 array of int, on my Core 2 Duo E8200 it takes 1.2 seconds, and 2.4 seconds for IList. I would really, just forget about that difference. PS I used methods supplied byt author under debugging. – Vladimir Perevalov Nov 26 '11 at 18:17
  • Have you tried compiling with `ngen`? Also, don't just run the sum once; run it like a million times or so (perhaps on a smaller array). – Kerrek SB Nov 26 '11 at 18:17
  • Interface calls are virtual calls. – R. Martinho Fernandes Nov 26 '11 at 18:24
  • @VladimirPerevalov For me it ended up going from 1x to 3x the execution time. My project deals with large arrays and I'm very hesitant to overlook this and convert my parameters to IList just so one piece of code; that is rarely used; doesn't have to call ToArray() – Generic Comrade Nov 26 '11 at 18:32
  • Performance difference between those two versions comes from different JITter decisions. You can test/see them only in release build and outside of debugger. – MagnatLU Nov 26 '11 at 18:56

3 Answers3

19

Your best bet is to create overload for Sum with int[] as argument if this method is performance-critical. CLR's JIT can detect foreach-style iteration over array and can skip range checking and address each element directly. Each iteration of loop takes 3-5 instructions on x86, with only one memory lookup.

When using IList, JIT does not have knowledge about underlying collection's structure and ends up using IEnumerator<int>. Each iteration of loop uses two interface invocation - one for Current, one for MoveNext (2-3 memory lookups and a call for each of those). This most likely ends up with ~20 quite expensive instructions and there is not much you can do about it.

Edit: If you are curious about actual machine code emitted by JIT from Release build without debugger attached, here it is.

Array version

            int s = 0;
00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        edi  
00000004  push        esi  
00000005  xor         esi,esi 
            foreach (int i in arg)
00000007  xor         edx,edx 
00000009  mov         edi,dword ptr [ecx+4] 
0000000c  test        edi,edi 
0000000e  jle         0000001B 
00000010  mov         eax,dword ptr [ecx+edx*4+8] 
                s += i;
00000014  add         esi,eax 
00000016  inc         edx  
            foreach (int i in arg)
00000017  cmp         edi,edx 
00000019  jg          00000010 

IEnumerable version

            int s = 0;
00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        edi  
00000004  push        esi  
00000005  push        ebx  
00000006  sub         esp,1Ch 
00000009  mov         esi,ecx 
0000000b  lea         edi,[ebp-28h] 
0000000e  mov         ecx,6 
00000013  xor         eax,eax 
00000015  rep stos    dword ptr es:[edi] 
00000017  mov         ecx,esi 
00000019  xor         eax,eax 
0000001b  mov         dword ptr [ebp-18h],eax 
0000001e  xor         edx,edx 
00000020  mov         dword ptr [ebp-24h],edx 
            foreach (int i in arg)
00000023  call        dword ptr ds:[009E0010h] 
00000029  mov         dword ptr [ebp-28h],eax 
0000002c  mov         ecx,dword ptr [ebp-28h] 
0000002f  call        dword ptr ds:[009E0090h] 
00000035  test        eax,eax 
00000037  je          00000052 
00000039  mov         ecx,dword ptr [ebp-28h] 
0000003c  call        dword ptr ds:[009E0110h] 
                s += i;
00000042  add         dword ptr [ebp-24h],eax 
            foreach (int i in arg)
00000045  mov         ecx,dword ptr [ebp-28h] 
00000048  call        dword ptr ds:[009E0090h] 
0000004e  test        eax,eax 
00000050  jne         00000039 
00000052  mov         dword ptr [ebp-1Ch],0 
00000059  mov         dword ptr [ebp-18h],0FCh 
00000060  push        0F403BCh 
00000065  jmp         00000067 
00000067  cmp         dword ptr [ebp-28h],0 
0000006b  je          00000076 
0000006d  mov         ecx,dword ptr [ebp-28h] 
00000070  call        dword ptr ds:[009E0190h] 
MagnatLU
  • 5,967
  • 1
  • 22
  • 17
  • That was my original plan, but the problem is that the people calling Sum() may not know if the underlying collection is a int[] or List. Either way I have to check within Sum(IList) to see if it's an int[], otherwise I end up with an if check before every Sum call. – Generic Comrade Nov 26 '11 at 18:17
  • To be clear: Calling Sum((IList)array) runs slower than Sum(array) with the overloads, so then what's the purpose of having the overload if the correct one may not be executed? – Generic Comrade Nov 26 '11 at 18:27
  • If you want to avoid type checking no matter the cost, you can use `ToArray()` on your method argument and iterate over `int[]` every time. For `int[]` argument you'd end up with creating copy of array, with other implementations there would be both `IEnumerator` iteration and array creation. Either way - performance loss and memory pressure are not worth it. Better use `is` and hide it in `#region` if you are ashamed of it :-) – MagnatLU Nov 26 '11 at 18:28
  • Point of keeping `Sum(IList)` would be to provide fallback for people using your code with other collections. Of course if it's private helper method noone would use, keep only the convenient implementation. – MagnatLU Nov 26 '11 at 18:31
  • I'm going to mark this as the answer, but I think I'm either going to end up with redundant code, or a method call within a loop; not quite that elegant. – Generic Comrade Nov 26 '11 at 18:49
3

Welcome to optimization. Things aren't always obvious here!

Basically, as you've found, when the compiler detects that certain types of safety constraints are proven to hold, it can issue enormously more efficient code when doing full optimization. Here (as MagnatLU shows) we see that knowing you've got an array allows all sorts of assumptions to be made about the size being fixed, and it allows memory to be accessed directly (which is also maximally efficient in how it integrates with the CPU's memory prefetch code, for bonus speed). When the compiler doesn't have the proof that it can generate super-fast code, it plays it safe. (This is the right thing to do.)

As a general comment, your workaround code is pretty simple when it comes to coding for optimization (when making the code super-readable and maintainable isn't always the first consideration). I don't really see how you could better it without making your class's API more complex (not a win!). Moreover, just adding a comment inside the body to say why you've done this would solve the maintenance issue; this is in fact one of the best uses for (non-doc) comments in the code in the first place. Given that the use case is large arrays (i.e., that it's reasonable to optimize at all in the first place) I'd say you have a great solution right there.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
0

As an alternative to @MagnatLU's answer above, you can use for instead of foreach and cache the list's Count. There is still overhead when compared to int[] but not quite as much: the IList<int> overload duration decreased by ~50% using your test code on my machine.

private static int Sum(IList<int> array)
{
    int sum = 0;

    int count = array.Count;
    for (int i = 0; i < count; i++)
        sum += array[i];

    return sum;
}
kcnygaard
  • 794
  • 7
  • 18