21

In .NET, using "foreach" to iterate an instance of IEnumerable will create a copy? So should I prefer to use "for" instead of "foreach"?

I wrote some code to testify this:

struct ValueTypeWithOneField
{
    private Int64 field1;
}

struct ValueTypeWithFiveField
{
    private Int64 field1;
    private Int64 field2;
    private Int64 field3;
    private Int64 field4;
    private Int64 field5;
}

public class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("one field");
        Test<ValueTypeWithOneField>();

        Console.WriteLine("-----------");

        Console.WriteLine("Five field");
        Test<ValueTypeWithFiveField>();

        Console.ReadLine();
    }

    static void Test<T>()
    {
        var test = new List<T>();
        for (int i = 0; i < 5000000; i++)
        {
            test.Add(default(T));
        }

        Stopwatch sw = new Stopwatch();

        for (int i = 0; i < 5; i++)
        {
            sw.Start();

            foreach (var item in test)
            {

            }

            sw.Stop();
            Console.WriteLine("foreach " + sw.ElapsedMilliseconds);
            sw.Restart();

            for (int j = 0; j < test.Count; j++)
            {
                T temp = test[j];
            }

            sw.Stop();
            Console.WriteLine("for " + sw.ElapsedMilliseconds);
            sw.Reset();
        }
    }}

And this is the result that I got after I ran the code:

    one field
    foreach 68
    for 72
    foreach 68
    for 72
    foreach 67
    for 72
    foreach 64
    for 73
    foreach 68
    for 72
    -----------
    Five field
    foreach 272
    for 193
    foreach 273
    for 191
    foreach 272
    for 190
    foreach 271
    for 190
    foreach 275
    for 188

As we can see in the result, "foreach" always takes more time than "for".

So should I prefer to use "for" instead of "foreach" when iterating through a generic collection of value type?

Note: thanks for the reminder, I edited the code and result. but still, foreach is running slower than for.

Cui Pengfei 崔鹏飞
  • 8,017
  • 6
  • 46
  • 87
  • 3
    I don't think your test is right. The foreach will assign a value to **item**, but the for loop doesn't assign anything. What happens if you actually do an assignment in the for loop: var k = test[j]; – rsbarro Apr 14 '11 at 13:16
  • I think, to get this test more accurate, you should do someting simple with the actual element. Like `var l = item + 1;` / `var l = test[i] + 1`. Furthermore you have to call the GetEnumerator function in the foreach loop, while for loop only counts without touching the lists. – Tokk Apr 14 '11 at 13:18
  • 1
    @Tokk: `item++` would not be allowed. You can't modify the iterator variable. The test needs equivalent actions, which means the `for` block needs to retrieve the value. – Adam Robinson Apr 14 '11 at 13:19
  • thanks, i just edited the code and result. – Cui Pengfei 崔鹏飞 Apr 14 '11 at 13:23
  • Your code is still meaningless. How about actually doing something with the elements instead of writing code that does nothing. – CodesInChaos Apr 14 '11 at 13:25
  • 4
    @CuiPengFei: That's a more accurate result, but what's the point here? You should use whichever loop construct makes more sense to you. Even if one has a performance edge, your results show a difference of ~50ms after enumerating a *five-million member collection*. You're talking about trivial amounts of time. – Adam Robinson Apr 14 '11 at 13:26
  • The stopwatch isn't being reset properly either ... I think I've been drowned by answers though. You need to reset it after the 'for' test, otherwise the time will be added to the next 'foreach' test. – Jeff Parker Apr 14 '11 at 13:28
  • @Jeff: *Excellent* catch. That explains why his subsequent `foreach` tests are so inflated. – Adam Robinson Apr 14 '11 at 13:29
  • There's definitely something funky going on here ... using 2 int64s instead of 5, the foreach is faster. Using 4 32bit ints, foreach is faster, but as soon as you add anything above this, foreach becomes dramatically slower. One bool field added to either of the two fieldsets mentioned above increases the 'foreach' runtime by 30ms (on my machine), while not significantly affecting the 'for' runtime. – Jeff Parker Apr 14 '11 at 13:49

8 Answers8

26

Your question is way, way too complex. Break it down.

Does using “foreach” to iterate a sequence of value types create a copy of the sequence?

No.

Does using "foreach" to iterate a sequence of value types create a copy of each value?

Yes.

Does using "for" to do an equivalent iteration of an indexed sequence of value types create a copy of each value?

Usually, yes. There are things you can do to avoid the copying if you know special things about the collection, like for instance that it is an array. But in the general case of indexed collections, indexing the sequence returns a copy of the value in the sequence, not a reference to a storage location containing the value.

Does doing anything to a value type make a copy of the value?

Just about. Value types are copied by value. That's why they're called value types. The only things that you do to value types that do not make a copy are calls to methods on the value type, and passing a value type variable using "out" or "ref". Value types are copied constantly; that's why value types are often slower than reference types.

Does using "foreach" or "for" to iterate a sequence of reference type copy the reference?

Yes. The value of an expression of reference type is a reference. That reference is copied whenever it is used.

So what's the difference between value types and reference types as far as their copying behaviour is concerned?

Value types are copied by value. Reference types copy the reference but not the thing being referred to. A 16-byte value type copies 16 bytes every time you use it. A 16 byte reference type copies the 4 (or 8) byte reference every time you use it.

Is the foreach loop slower than the for loop?

Often it is. The foreach loop is often doing more work, in that it is creating an enumerator and calling methods on the enumerator, instead of just incrementing an integer. Integer increments are extremely fast. Also don't forget that the enumerator in a foreach loop has to be disposed, and that can take time as well.

Should I use the for loop instead of the foreach loop because the for loop is sometimes a few microseconds faster?

No. That's dumb. You should make smart engineering decisions based on customer-focussed empirical data. The extra burden of a foreach loop is tiny. The customer will probably never notice. What you should do is:

  • Set performance goals based on customer input
  • Measure to see if you've met your goals
  • If you have not, find the slowest thing using a profiler
  • Fix it
  • Repeat until you've met your goals

Odds are extremely good that if you have a performance problem, changing a foreach loop to a for loop will make no difference whatsoever to your problem. Write the code the way it looks clear and understandable first.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 4
    +1 And i am starting to like the way you are [answering](http://stackoverflow.com/questions/4817369/why-does-does-it-really-listt-implement-all-these-interfaces-not-just-ilist/4818566#4818566) These days :) – Shekhar_Pro Apr 14 '11 at 15:27
  • Skeet is got nothing on Lippert. :) – Esteban Araya Apr 21 '11 at 05:38
13

Your test is not accurate; in the foreach version, you're actually spinning up the enumerator and retrieving each value from the list (even though you aren't using it). In the for version, you aren't doing anything with the list at all, other than looking at its Count property. You're essentially testing the performance of an enumerator traversing a collection compared to incrementing an integer variable an equivalent number of times.

To create parity, you'd need to declare a temporary variable and assign it in each iteration of the for loop.

That being said, the answer to your question is yes. A copy of the value will be created with every assignment or return statement.

Performance

This pseudocode breakdown should explain why foreach is somewhat slower than using for in this particular instance:

foreach:

try
{
    var en = test.GetEnumerator(); //creates a ListEnumerator
    T item;

    while(en.MoveNext()) // MoveNext increments the current index and returns
                         // true if the new index is valid, or false if it's
                         // beyond the end of the list. If it returns true,
                         // it retrieves the value at that index and holds it 
                         // in an instance variable
    {
        item = en.Current; // Current retrieves the value of the current instance
                           // variable
    }
}
finally { }

for:

int index = -1;
T item;

while(++index < test.Count)
{
    item = test[index];
}

As you can see, there's simply less code in the for implementation, and foreach has a layer of abstraction (the enumerator) on top of the for. I wrote the for using a while loop to show the two versions in a similar representation.

With all that said...

You're talking about a trivial difference in execution time. Use the loop that makes the code clearer and smaller, and in this circumstance that looks like foreach.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
  • Are you sure about the copying part? I'm not familiar with .NET personally, but I can't think of any other object-oriented language in which `foreach` creates a copy of each object instance. Typical behavior is to just assign a reference to the next object, not create a whole new instance in memory. – aroth Apr 14 '11 at 13:26
  • @aroth: That would be the distinction between value types (which is what the OP is talking about) and reference types. All languages create a copy of the *value* of the variable upon assignment, with the distinction being that the value is a reference type is a reference to an instance, and the value of a value type is the structure itself. – Adam Robinson Apr 14 '11 at 13:27
  • Fair enough. To me "copy" means to create a new independent object instance in memory with the same state as the source instance. So of course assignment of a reference (or anything else) necessarily creates a copy of the reference, in the sense that you have two variables with the same value, but there's still only one instance of the actual object in memory. But yes, with `structs` being assigned as in the example code, the entire instance will be copied on assignment as you say. – aroth Apr 14 '11 at 13:41
3

You're not resetting the "stopwatch" after the "for" test, so the time taken in the 'for' test is being added to the subsequent 'foreach' test. Also, as correctly specified, you should do an assignment inside the 'for' to mimic the exact behaviour of the foreach.

sw.Start();

foreach (var item in test)
{

}

sw.Stop();
Console.WriteLine("foreach " + sw.ElapsedMilliseconds);
sw.Restart();

for (int j = 0; j < test.Count; j++)
{
    T temp = test[j];
}

sw.Stop();
Console.WriteLine("for " + sw.ElapsedMilliseconds);
sw.Reset(); // -- This bit is missing!
Jeff Parker
  • 7,367
  • 1
  • 22
  • 25
2

In your for cycle, I don't see you actually accessing items from the test. If you add var x = test[i]; into the for cycle, you'll see that the performance will be (virtually) the same.

Every access to a value-type property creates a copy, either with foreach or using indexer on the list in a for cycle.

František Žiačik
  • 7,511
  • 1
  • 34
  • 59
1

here's a discussion on the topic Why should I use foreach instead of for (int i=0; i<length; i++) in loops?

Community
  • 1
  • 1
Jason
  • 15,915
  • 3
  • 48
  • 72
1

Your test is not fair. Consider how the foreach loop operates. You have the following code:

foreach (var item in test)
{

}

This creates a variable item, and on each iteration fetches the next object from the collection, and assigns it to item. This fetch and assign shouldn't create a copy, but it does take time to access the underlying collection and assign the correct value to the variable.

Then you have this code:

for (int j = 0; j < test.Count; j++)
{

}

This does not access the underlying collection at all. It does not read and assign a variable on each iteration. It simply increments an integer test.Count times, so of course it is faster. And if the compiler is smart, it will see that no operation happens in the loop and just optimize the whole thing away.

A fair comparison would replace that second bit of code with something like:

var item;
for (int j = 0; j < test.Count; j++)
{
    item = test.get(j);
} 

That is more comparable to what your foreach loop is doing.

As for which to use, it's really a matter of personal preference and coding style. I generally feel that foreach is more clear than for(...) from a readability standpoint.

aroth
  • 54,026
  • 20
  • 135
  • 176
  • 1
    @CuiPengFei - One might expect the `for` loop to still be a bit faster, as the `foreach` loop is likely using some sort of enumeration interface/protocol (i.e. a function call) to fetch the next object instance. The revised `for` loop, on the other hand, is just doing a direct access of the next object. As function calls are relatively expensive operations, this would explain why the `for` loop is still faster. You might be able to even it out further by doing something like `static T getItem(int index) { return test[index];}` and calling that from your `for` loop. – aroth Apr 14 '11 at 13:47
  • I just tried that. Now the for loop is running slower. thanks. that solved my doubt. – Cui Pengfei 崔鹏飞 Apr 14 '11 at 13:54
  • and i compared foreach and while like this: var enumerator = test.GetEnumerator(); while (enumerator.MoveNext()) { T temp = enumerator.Current; } they basically take the same amount of time. – Cui Pengfei 崔鹏飞 Apr 14 '11 at 14:00
  • 1
    You may wish to correct your answer, as the fetch and assign *will* create a copy (as discussed above). In addition, there's no distinction between a property access (via the indexer `this[int index]` on the list) and accessing the `Current` property of the enumerator generated by the `foreach` call, other than an additional level of indirection. While this can (and probably does) have *some* performance impact, you are not comparing a function call vs. "direct access". – Adam Robinson Apr 14 '11 at 14:07
  • 1
    @Cui: That's because, aside from not accounting for an `IDisposable` enumerator and running within a `try { } finally { }`, the code using `GetEnumerator()` is the same as using `foreach`. – Adam Robinson Apr 14 '11 at 14:08
  • 1
    Additionally, it would be inaccurate to use a static function to retrieve the list item, as you're now introducing *another* variable (static functions) to the equation. – Adam Robinson Apr 14 '11 at 14:09
  • @Adam: yes, you are right. the static method runs way slower. – Cui Pengfei 崔鹏飞 Apr 14 '11 at 15:46
1

I think that foreach provides an abstract way of looping through but it is technically slower than the for loop, a good article on the differences between the for loop and foreach can be found here

Vamsi
  • 4,237
  • 7
  • 49
  • 74
1

I found only one case when it matters - Developing for Windows Phone 7. There are two reason why one should change

foreach(var item in colletion)
{
}

To

int length = array.Length;
for(int i = 0; i < length; ++i)
{
}

in XNA game if collections are big or it is called often(f.e. Update method).

  • It is a bit faster
  • less garbage

and garbage is critical, since Compact Framework GC fires every 1MB allocation, as a result, it may causes annoying freezes.

Lukasz Madon
  • 14,664
  • 14
  • 64
  • 108