2

Consider the following block of code:

using (FileStream fs = new FileStream(@"C:\bad_records.txt", 
                                      FileMode.Create, 
                                      FileAccess.Write))
{
    var badEnumerable = _cache.Where(kvp => !kvp.Value.Item1);

    fs.WriteLine(string.Format("BAD  RECORDS ({0})", badEnumerable.Count()));
    fs.WriteLine("==========");

    foreach (var item in badEnumerable)
    {
        fs.WriteLine(string.Format("{0}: {1}", item.Key, item.Value.Item2));
    }
}

where _cache is defined like this:

static Dictionary<string, Tuple<bool, string, string>> _cache;

am I iterating this enumerable twice? Once with Count() and once with the foreach?

Mike Perrenoud
  • 66,820
  • 29
  • 157
  • 232
  • 1
    Yes, Where has deferred execution. – Vladimir Apr 23 '13 at 14:53
  • For a **definite** answer, we need to know what kind of thing `_cache` is... – AakashM Apr 23 '13 at 14:56
  • 1
    @AakashM No we don't. We *know* it's being iterated twice. We don't know if it's a terribly bad thing or not; it might be perfectly fine to iterate it twice, or it could be very horrible, but we can be 100% sure he is iterating it twice. – Servy Apr 23 '13 at 14:58
  • I'm certain your own dialectical prowess could have figured this - seems like a question for question's sake. – Grant Thomas Apr 23 '13 at 15:00
  • @GrantThomas, it was to verify that I was correct in my suspicions. I could have probably reflected it and read the code from Microsoft, but it seemed like a pretty good question for the community to have as well. – Mike Perrenoud Apr 23 '13 at 15:02
  • @MichaelPerrenoud Reflection is most certainly not the easiest way to answer this question. It's much easier to just create an enumerable that generates side effects when iterated, thus allowing you to observe the number of times it's iterated. This is bad practice in production code, but great for debugging. – Servy Apr 23 '13 at 15:03
  • @MichaelPerrenoud In that case it is generally nice to provide your research effort and your initial intuitions or suspicions, that would have made this whole thing feel a lot nicer; I feel a little dirty just being here, seeing this being asked gives me some feeling of fraudulence. – Grant Thomas Apr 23 '13 at 15:04
  • 1
    @servy from the sample provided `_cache` could be **anything**, including, say, a class which comes with extension methods superseding those in `Enumerable`, for example to provide a `Count()` method that doesn't actually enumerate... – AakashM Apr 23 '13 at 15:07
  • @AakashM It doesn't matter if it provides a `Count` that doesn't enumerate the sequence, as `Where` is used on it first. I suppose technically it could be a type that provides it's own `Where` extension that returns something other than an `IEnumerable`, but then the entire question is *highly* misleading, considering the variable is called `badEnumerable`, etc. Anyone capable of writing such a type is not only really mean to consumers of that type, but is sufficiently knowledgeable to answer this question for themselves. – Servy Apr 23 '13 at 15:10
  • @Servy `Where` *always* returns a custom type, that just happens to implement `IEnumerable`; there's nothing to say that `IEnumerable.Count()` *must* enumerate the items, it merely has to return a correct value. The fact that the default types that are returned by, e.g. `List<>` happen to do so is an implementation detail. – Michael Edenfield Apr 23 '13 at 15:12
  • @Servy ok so less esoterically, what if `_cache` is an `IQueryable` ?! While C# isn't as full of boundless possibilities for source code meaning as C++, it does help to actually be asked a self-contained question; hence my first comment. – AakashM Apr 23 '13 at 15:12
  • @MichaelEdenfield Yes, it always returns a custom type, and that custom type won't ever implement `ICollection` in the library version. – Servy Apr 23 '13 at 15:13
  • @AakashM Then the query is guaranteed to be executed twice (well, technically two different queries will be executed, as one only needs to get the count), which is the moral equivalent of an in memory collection being iterated twice. – Servy Apr 23 '13 at 15:13
  • @Servy but, as AakashM pointed out originally, we have no idea if `_cache` is a library object, a LINQ provider custom object, or some random custom `IEnumerable` the OP got from a third-party vendor somewhere. It *looks like* its a `Dictionary` but there are plenty of other things that store key/value pairs besides just `Dictionary<>` – Michael Edenfield Apr 23 '13 at 15:14
  • @MichaelEdenfield As long as the object is typed as an `IEnumerable` (Or `IQueryable`) there is no question at all. Technically it could be some type that has it's own `Where` instance method that returns something entirely different from `IEnumerable` and the entire code snippet uses no LINQ at all, but, as I said, that would be highly misleading in the question and would never happen by accident. – Servy Apr 23 '13 at 15:16
  • @Servy, I've edited the question with the definition of `_cache`. – Mike Perrenoud Apr 23 '13 at 15:20

5 Answers5

8

Yes, you are iterating the enumerable twice.

An easy way to test this is to use a helper method such as this:

private static int count = 0;
public static IEnumerable<T> CountIterations<T>(IEnumerable<T> sequence)
{
    count++;
    //or some other debug type logging
    Console.WriteLine("Iterated {0} times.", count);
    foreach(var item in sequence)
        yield return item;
}
Servy
  • 202,030
  • 26
  • 332
  • 449
  • 1
    Without any explanation, this is not really a valuable answer. – ashes999 Apr 23 '13 at 14:54
  • 1
    It almost seems too easy.. : P – Kestami Apr 23 '13 at 14:54
  • 1
    @ashes999 Well, the explination is really in the question itself, `Count` and `foreach` each iterate the `IEnumerable`. Not a whole lot more to say, although I've added a simple means of observing it more easily. – Servy Apr 23 '13 at 14:56
  • @Servy, this is exactly what I expected was happening. These are pretty small lists (e.g. maximum of 15,000) so it's probably not a real issue to target, but you have in fact fully confirmed my suspicion. Thanks a lot friend! – Mike Perrenoud Apr 23 '13 at 15:00
  • Is this method supposed to replace the call to `.Count()`? If so, then it is only correct because of the type that `badEnumerable` is, and this cannot be used to test a generic case of "does `.Count()` enumerate twice?" If `badEnumerable` implemented `ICollection`, then `.Count()` would not produce two iterations, but your code would. – Jon Senchyna Apr 23 '13 at 15:35
  • @JonSenchyna The code is not a solution to avoid multiple iteration, it's a debugging tool to determine if your code is iterating the sequence multiple times or not. It would require you to use some other means of fixing the code if it is. In this case you'd stick a call to this method between `_cache` and `Where`. After doing so it will print to the console (or wherever) the number of iterations of the sequence. In this particular case, it will print 1 and then 2, indicating the sequence was iterated twice, thus answering the question. – Servy Apr 23 '13 at 15:44
  • Thank you for pointing that out. It was not clear from your answer where you were using your helper function. I did not think it was a solution, but I did think that you were replacing the call to `.Count()` with it to see if it was enumerating there (based on the lack of explanation and the function name). I would recommend putting a brief statement in your answer to clearly state how to use your helper function. Regardless, +1 for the good answer and helpful tool. – Jon Senchyna Apr 23 '13 at 16:13
3

You use your dictionary _cache as IEnumerable of KeyValuePair.
Where method has deferred execution.
So you enumerate it twice: in Count and in foreach.

You can change it as:
var badEnumerable = _cache.Where(kvp => !kvp.Value.Item1).ToArray();
and fs.WriteLine(string.Format("BAD RECORDS ({0})", badEnumerable.Length));.

Vladimir
  • 7,345
  • 4
  • 34
  • 39
2

Yes, the .Count() and the foreach will both cause _cache to be enumerated twice and verified against the predicate in the Where cause.

As to whether this is a problem or not depends on a number of things:

  1. Are all the values in _cache in memory already or is it querying an underlying source such as a database.
  2. How many items are in the queried collection and what is the cost of the comparison.
  3. Can the source be enumerated safely multiple times.

For example, if values in _cache are already in memory and the predicate is a simple boolean property comparison then enumerating the cache twice is probably more efficient and will add no extra memory overhead whereas adding a .ToList() will still result in 2 enumerations (one of _cache and one of the list) however the predicate check will only happen once (in the .ToList() call) and the foreach will have less objects to enumerate but you will have added the extra memory overhead of the additional list.

If the cache comes from a database then the memory overhead of adding a .ToList() after the .Where() will almost certainly be better than performing 2 individual queries against the database.

Trevor Pilley
  • 16,156
  • 5
  • 44
  • 60
1

The short answer is yes.

Depending on the underlying type of badEnumerable, it might be enumerated twice. This is due to what is called "deferred execution". Deferred execution means that your LINQ query is not actually executed "until the query variable is iterated over in a foreach or For Each loop" (MSDN). Your foreach statement is obviously an iteration over the variable, and Enumerable.Count() also performs an iteration (in this case).

In some cases though, this will not cause two iterations. This occurs when badEnumerable is actually a subclass of ICollection. In that case, calling .Count() actually references the underlying .Count property and does not enumerate it.

Since badEnumerable is a Dictionary<TKey, TValue>, and also because the call to Enumerable.Where() returns a generic IEnumerable (one that is not an ICollection), your specific case does not run into this situation and will iterate twice.

Community
  • 1
  • 1
Jon Senchyna
  • 7,867
  • 2
  • 26
  • 46
0

Yes, to avoid iterating twice, use a List You may print the count after the initial iteration. Another solution is to save the text written in the loop and print it after you print the count.

Edit Corrected:

using (FileStream fs = new FileStream(@"C:\bad_records.txt", FileMode.Create, FileAccess.Write))
{
    var badEnumerable = _cache.Where(kvp => !kvp.Value.Item1);

    int count = 0;
    foreach (var item in badEnumerable)
    {
        count++;
        Console.WriteLine(string.Format("{0}: {1}", item.Key, item.Value.Item2));
    }

    Console.WriteLine("==========");
    Console.WriteLine(string.Format("BAD  RECORDS ({0})", count));
}
Ahmed KRAIEM
  • 10,267
  • 4
  • 30
  • 33