26

I would have thought that executing the following code for an empty collection that implements IEnumerable<T> would throw an exception:

var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
var type = enumerator.Current.GetType(); // Surely should throw?

Because the collection is empty, then accessing IEnumerator.Current is invalid, and I would have expected an exception. However, no exception is thrown for List<T>.

This is allowed by the documentation for IEnumerator<T>.Current, which states that Current is undefined under any of the following conditions:

  • The enumerator is positioned before the first element in the collection, immediately after the enumerator is created. MoveNext must be called to advance the enumerator to the first element of the collection before reading the value of Current.
  • The last call to MoveNext returned false, which indicates the end of the collection.
  • The enumerator is invalidated due to changes made in the collection, such as adding, modifying, or deleting elements.

(I'm assuming that "fails to throw an exception" can be categorised as "undefined behaviour"...)

However, if you do the same thing but use an IEnumerable instead, you DO get an exception. This behaviour is specified by the documentation for IEnumerator.Current, which states:

  • Current should throw an InvalidOperationException if the last call to MoveNext returned false, which indicates the end of the collection.

My question is: Why this difference? Is there a good technical reason that I'm unaware of?

It means identical-seeming code can behave very differently depending on whether it's using IEnumerable<T> or IEnumerable, as the following program demonstrates (note how the code inside showElementType1() and showElementType1() is identical):

using System;
using System.Collections;
using System.Collections.Generic;

namespace ConsoleApplication2
{
    class Program
    {
        public static void Main()
        {
            var list = new List<int>();

            showElementType1(list); // Does not throw an exception.
            showElementType2(list); // Throws an exception.
        }

        private static void showElementType1(IEnumerable<int> collection)
        {
            var enumerator = collection.GetEnumerator();
            enumerator.MoveNext();
            var type = enumerator.Current.GetType(); // No exception thrown here.
            Console.WriteLine(type);
        }

        private static void showElementType2(IEnumerable collection)
        {
            var enumerator = collection.GetEnumerator();
            enumerator.MoveNext();
            var type = enumerator.Current.GetType(); // InvalidOperationException thrown here.
            Console.WriteLine(type);
        }
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • 3
    but in first case - T - _int_ that can't be _null_, so no exception, but in second - Curretn is object, so default is _null_ – Grundy Jun 15 '15 at 08:46
  • I've been writing this as an answer, but it doesn't really feel good enough, so... Performance - generic collections have been designed with fixing the huge performance issues of boxing value types in a list (among other reasons, of course). Checking for `hasValue` on each `Current` access doesn't sound like a lot, but consider that LINQ also came at that time - using enumerators *heavily*, even for things that used to be simple array iterations. You're supposed to check `MoveNext` anyway, so it doesn't hurt much to allow `Current` to be arbitrary when `MoveNext` returns `false`. – Luaan Jun 15 '15 at 08:50
  • 2
    in source for `List`: [current = default(T);](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,1196) – Grundy Jun 15 '15 at 08:50
  • @Grundy It happens whether it's a reference type or not, so that's nothing to do with it. You can change my sample code to `List` and the same thing happens. – Matthew Watson Jun 15 '15 at 08:51
  • @Grundy While that's true, the `IEnumerable` case doesn't throw `NullReferenceException` - it throws `InvalidOperationException`. It explicitly checks whether the last `MoveNext` (if any) returned `false` or not. – Luaan Jun 15 '15 at 08:51
  • @Luaan, interesting that for generic: [`just return current`](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,1233) but for non generic: [added range checking](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,1239) – Grundy Jun 15 '15 at 08:56
  • @Grundy Yup, I've checked that as well. But the determining factor is the `IEnumerator` and `IEnumerator` contracts, respectively. The `IEnumerator` is less strict, so `List` can *afford* to omit the check for the typed enumerator. Since typed enumerators were introduced along with LINQ (which meant a huge increase in using enumerators), this is entirely reasonable. The old `IEnumerator` contract is still there for backwards compatibility, and works the same as before. – Luaan Jun 15 '15 at 09:01
  • 2
    Yet another side-effect of the chronically leaky abstraction of value types. It's not like the framework couldn't check this, the code to do so is just far too expensive to be added to every use of Current. – Hans Passant Jun 15 '15 at 09:06
  • @Luuan Indeed - it looks like they deliberately relaxed the constraints for `IEnumerable` because they wanted to increase performance. – Matthew Watson Jun 15 '15 at 09:09

2 Answers2

24

The problem with IEnumerable<T> is that Current is of type T. Instead of throwing an exception, default(T) is returned (it is set from MoveNextRare).

When using IEnumerable you don't have the type, and you can't return a default value.

The actual problem is you don't check the return value of MoveNext. If it returns false, you shouldn't call Current. The exception is okay. I think they found it more convenient to return default(T) in the IEnumerable<T> case.

Exception handling brings overhead, returning default(T) doesn't (that much). Maybe they just thought there was nothing useful to return from the Current property in the case of IEnumerable (they don't know the type). That problem is 'solved' in IEnumerable<T> when using default(T).

According to this bug report (thanks Jesse for commenting):

For performance reasons the Current property of generated Enumerators is kept extremely simple - it simply returns the value of the generated 'current' backing field.

This could point in the direction of the overhead of exception handling. Or the required extra step to validate the value of current.

They effectively just wave the responsibility to foreach, since that is the main user of the enumerator:

The vast majority of interactions with enumerators are in the form of foreach loops which already guard against accessing current in either of these states so it would be wasteful to burn extra CPU cycles for every iteration to check for these states that almost no one will ever encounter.

Community
  • 1
  • 1
Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • 2
    I don't think it being more convenient to return `default(T)` is the answer. Surely it's just as easy (and more informative to the caller) to throw an exception? My question is really, why would they choose to cover-up an error? – Matthew Watson Jun 15 '15 at 08:52
  • Well, exception handling brings overhead, returning `default(T)` doesn't. Maybe they just thought there was nothing useful to return from the `Current` property. That problem is 'solved' when using `default(T)`. – Patrick Hofman Jun 15 '15 at 08:53
  • 2
    See [this bug report](https://connect.microsoft.com/VisualStudio/feedback/details/806924/iterator-blocks-are-not-implemented-according-to-documentation): `For performance reasons the Current property of generated Enumerators is kept extremely simple - it simply returns the value of the generated 'current' backing field.`. – Jesse Good Jun 15 '15 at 08:54
  • 1
    But why generic `Current` do not throw exception? – Sergey Berezovskiy Jun 15 '15 at 08:54
  • 1
    @JesseGood that is the actual answer to this question :) – Sergey Berezovskiy Jun 15 '15 at 08:55
  • @JesseGood: Good one. So performance could point to exception handling overhead? – Patrick Hofman Jun 15 '15 at 08:56
  • 2
    But not throwing an exception means that invalid code will incorrectly execute! – Matthew Watson Jun 15 '15 at 08:56
  • @PatrickHofman Unlikely. Instead, it's the extra check needed - the `IEnumerator.Current` code has to do extra bounds checks, on each `Current` evaluation. The `IEnumerator` simply returns a field value. – Luaan Jun 15 '15 at 08:57
  • 2
    @MatthewWatson: But if it is documented you should check the return value, who is wrong? – Patrick Hofman Jun 15 '15 at 08:58
  • Actually strange solution by MS. I think assigning default value is unnecessary overhead. Throwing exception in Current will be overhead only if someone didn't check result of `MoveNext`. Well.. we will have to check some condition each time we access current. – Sergey Berezovskiy Jun 15 '15 at 08:59
  • While the interface of `IEnumerable` doesn't have a type, any given implementation does. It would be just as possible to have `Current` return the default value of that. – Jon Hanna Jun 15 '15 at 09:00
  • 1
    @PatrickHofman I think that extra checking is good if you can do it. Otherwise why do we validate method arguments and throw ArgumentOutOfRangeExceptions, and so on? – Matthew Watson Jun 15 '15 at 09:00
  • I think we're overlooking one thing. The use of `GetEnumerator` and `MoveNext` explicitly should be a rather *rare thing* to do. This is way `foreach` was provided, so we don't have to deal with these kind of nasty of by one (although documented) behavior. – Yuval Itzchakov Jun 15 '15 at 09:00
  • @YuvalItzchakov: Indeed. That is what I have just added from the linked post. – Patrick Hofman Jun 15 '15 at 09:01
  • @SergeyBerezovskiy Actually, you can see that the enumerator implementation for list doesn't check anything for `IEnumerable.Current` - it simply returns the value of a local field. The result is *undefined* when `MoveNext` returns `false` - in practice, this means that if there's no elements in the list, the result of `Current` is `default(T)`, but otherwise, it's the *last* element. It's a pretty simple and effective optimization - `MoveNext` does the exact same check anyway, so why do it twice? – Luaan Jun 15 '15 at 09:04
  • I really don't think they avoided throwing an exception because of extra overhead of an exception. They could have done that for the non-generic version too, but they throw an exception for that. I can see @JesseGood 's answer, and that seems to be the actual reason - they want `.Current` to be as fast as possible. – Matthew Watson Jun 15 '15 at 09:04
  • @MatthewWatson: See the last part of my answer with another part of the linked post that explains the 'performance' they were tuning. – Patrick Hofman Jun 15 '15 at 09:05
  • Yes, I agree - the answer would appear to be "For performance reasons". I guess for something as fundamental as iterating over a collection, performance >> guaranteed correctness. – Matthew Watson Jun 15 '15 at 09:07
4

To better match how people tend to implement it in practice. As is the change of wording from "Current also throws an exception …" in previous versions of the documentation to "Current should throw …" in the current version.

Depending on how the implementation is working, throwing an exception might be quite a bit of work, and yet because of how Current is used in conjunction with MoveNext() then that exceptional state is hardly ever going to come up. This is all the more so when we consider that the vast majority of uses are compiler-generated and don't actually have scope for a bug where Current is called before MoveNext() or after it has returned false to ever happen. With normal use we can expect the case to never come up.

So if you are writing an implementation of IEnumerable or IEnumerable<T> where catching the error condition is tricky, you might well decide not to do so. And if you do make that decision, it probably isn't going to cause you any problems. Yes, you broke the rules, but it probably didn't matter.

And since it won't cause any problems except for someone who uses the interface in a buggy way, documenting it as undefined behaviour moves the burden from the implementer to the caller to not do something the caller shouldn't be doing in the first place.

But all that said, since IEnumerable.Current is still documented as "should throw InvalidOperationException for backwards compatibility and since doing so would match the "undefined" behaviour of IEnumerable<T>.Current, the probably the best way to perfectly fulfil the documented behaviour of the interface is to have IEnumerable<T>.Current throw an InvalidOperationException in such cases, and have IEnumerable.Current just call into that.

In a way this is the opposite to the fact that IEnumerable<T> also inherits from IDisposable. The compiler-generated uses of IEnumerable will check if the implementation also implements IDisposable and call Dispose() if it does, but aside from the slight performance overhead of that test it meant that both implementers and hand-coded callers would sometimes forget about that, and not implement or call Dispose() when they should. Forcing all implementations to have at least an empty Dispose() made life easier for people in the opposite way to just having Current have undefined behaviour when it isn't valid.

If there were no backwards compatibility issues then we would probably have Current documented as undefined in such cases for both interfaces, and both interfaces inheriting from IDisposable. We probably also wouldn't have Reset() which is nothing but a nuisance.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251