1

In C#, what is the fastest way to create and fill a List using an IEnumerable in terms of time required to write the code for? What about in terms of time required to execute?

My first thought was this:

List<int> list = new List<int>();

foreach(int number in iterator)
    list.Add(number);

Is there a faster way?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
NetherGranite
  • 1,940
  • 1
  • 14
  • 42
  • `enumerable.ToList()`? – ChiefTwoPencils Jan 21 '19 at 21:36
  • 4
    https://ericlippert.com/2012/12/17/performance-rant/ – mjwills Jan 21 '19 at 21:36
  • 1
    There are optimisations available, depending on the type of `IEnumerable`. It would be awesome if you could provide a [mcve]. – mjwills Jan 21 '19 at 21:37
  • 2
    '"fastest to execute" might be a competing concern to "fastest to write", readability, maintainability, and so forth. The latter concerns can be considered more important than execution time; unless it's not. – ChiefTwoPencils Jan 21 '19 at 21:38
  • Considering the BigO you cannot avoid O(n) complexity... – Johnny Jan 21 '19 at 21:42
  • 2
    `List list = iterator.ToList();` which will invoke `new List(iterator)` which will analyze the type of collection being provided, optimize for some basic scenarios, and then fall back to the code you have if all else fails. Why do you need to find a faster way than that? The next method in optimization is if you can't make the code faster, run it fewer times, so do you actually need to construct that list? – Lasse V. Karlsen Jan 21 '19 at 21:42
  • @LasseVågsætherKarlsen What do you mean by "why do you need to find a faster way than that"? I didn't say that anywhere. If you turn that into an answer, I'll gladly mark it as accepted; you make it clear that the fastest way is intelligently determined by that method, and it's certainly very short code to write. – NetherGranite Jan 21 '19 at 21:44
  • I'm pretty sure this is a duplicate, I just can't find it, so I'll refrain from posting an answer. If it turns out not to be a duplicate, someone else can just take my comment and turn it into an answer for you if needed. – Lasse V. Karlsen Jan 21 '19 at 21:48

1 Answers1

3

When it comes to List<T> essentially you have 2 approaches, which I am trying to discuss below. For the sake of clarity lets assume, allocation of the List<T> takes constant time (C), adding an element to the List<T> also takes constant time.


Create empty List<T> and populate it

List<int> list = new List<int>(); // C
foreach(int i in iterator)
{
    list.Add(i); //n*C
}

as you can see this approach takes n*C + C time, so if you neglect the C the complexity is O(n).


Create List<T> based on the other IEnumerable<T>

List<int> list = new List<int>(iterator);

however, there is a small difference regards the type of iterator:

  1. if the iterator is ICollection<T>

    var array = new T[ICollection.Count] // C ICollection.CopyTo(array) // by MSDN O(n)

  2. if the iterator is IEnumerable<T>, the same as creating empty and add item by item

So, if you analyze the complexity you cannot avoid O(n) complexity.

BUT...

There is one caveat with the List<T> growth and capacity which might impact performances. The default List<T> capacity is 4 and if you add more than 4 elements to the List<T> the new underlying array, twice of the current size, will be allocated and the elements will be copied...this process will repeat again when we reach the capacity of the List<T>. You can imagine how much unnecessary copying you might have. In order to prevent this, the best option is to initialize List<T> with capacity in advance or use the List<T>(ICollection<T>) ctor.

// benchmark example
var enumerable = Enumerable.Repeat(1, 1000000);
var collection = enumerable.ToList();

Stopwatch st = Stopwatch.StartNew();
List<int> copy1 = new List<int>(enumerable);
Console.WriteLine(st.ElapsedMilliseconds);

st = Stopwatch.StartNew();
List<int> copy2 = new List<int>(collection);
Console.WriteLine(st.ElapsedMilliseconds);
Johnny
  • 8,939
  • 2
  • 28
  • 33
  • Wow, incredibly thorough and helpful answer. Thanks a ton! – NetherGranite Jan 21 '19 at 22:40
  • When it comes to resizing the array: this time is real, but the amortized time for resizing is accepted as *O(1)* because the last insert requiring a resize performs at most *N/2* moves (as in your example where the capacity is doubled). Therefore, the total number of moves over the entire sequence is: *N/2 + N/4 + ... + 2 + 1 < N*. – ChiefTwoPencils Jan 21 '19 at 22:59