Random access on .NET lists is slow, but what if I always reference the first element?

Question

I know that in general, .NET Lists are not good for random access. I've always been told that an array would be best for that. I have a program that needs to continually (like more than a billion times) access the first element of a .NET list, and I am wondering if this will slow anything down, or it won't matter because it's the first element in the list. I'm also doing a lot of other things like adding and removing items from the list as I go along, but the List is never empty.

I'm using F#, but I think this applies to any .NET language (I am using .NET Lists, not F# Lists). My list is about 100 elements long.

`List` is backed by an array, and gives you constant time random access — jdphenix, Apr 24 '15 at 17:33
`List` uses an array internally: the [source](http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,cf7f4095e4de7646) on reference source.microsoft.com shows the field `private T[] _items;`. — Wai Ha Lee, Apr 24 '15 at 17:35
@jdphenix: I originally thought arrays in lists were about the same, but my experimentation has proven otherwise. Array and list access are both O(1), but that doesn't guarantee similar performance. My tests show more than a 50x speed difference. — recursive, Apr 24 '15 at 17:58
Arrays and `List` have fast random access. `LinkedList` is a doubly-linked list and thus has slow random access (at doesn't provide an indexer). F#'s lists are singly-linked list and thus have slow random access. — CodesInChaos, Apr 24 '15 at 18:01

score 6 · Answer 1 · answered Apr 24 '15 at 18:48

In F#, the .NET list (System.Collections.Generic.List) is aptly aliased as ResizeArray, which leaves little doubt as to what to expect. It's an array that can resize itself, and not really a list in the CS-classroom understanding of the term. Any performance differences between it and a simple array most likely come from the fact that compiler can be more aggressive about optimizing array usage.

Back to your question. If you only access the first element of a list, it doesn't matter what you choose. Both a ResizeArray and a list (using F# lingo) have O(1) access to the first element (head).

A list would be a preferable choice if your other operations also work on the head element, i.e. you only add elements from the head. If you want to append elements to the end of the list, or mutate some elements that already in, you'd get better mileage out of a ResizeArray.

That said, a ResizeArray in idomatic F# code is a rare sight. The usual approach favors (and doesn't suffer from using) immutable data structures, so seeing one usually would be a minor red flag for me.

recursive · Accepted Answer · 2015-04-24T17:53:44.527

3

There is not much difference between the performance of random access for an array and a list. Here's a test on my machine.

var list = Enumerable.Range(1, 100).ToList();
var array = Enumerable.Range(1, 100).ToArray();

int total = 0;

var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000000; i++) {
    total ^= list[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);

sw.Restart();
for (int i = 0; i < 1000000000; i++) {
    total ^= array[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);

This produces this output:

 Time for list: 00:00:05.2002620 
 Time for array: 00:00:03.0159816

If you know you have a fixed size list, it makes sense to use an array, otherwise, ~~there's not much cost to the list.~~ (see update)

Update!

I found some pretty significant new information. After executing the script in release mode, the story changes quite a bit.

Time for list: 00:00:02.3048339
Time for array: 00:00:00.0805705

In this case, the performance of the array totally dominates the list. I'm pretty surprised, but the numbers don't lie.

Go with the array.

edited Apr 24 '15 at 17:53

answered Apr 24 '15 at 17:37

recursive

83,943
34
151
241

4

Somebody will *definitely* ask (in this case, that somebody is me) - was this in `DEBUG` mode? – Wai Ha Lee Apr 24 '15 at 17:44
When you look at it as a percentage, array is about 40% faster, which is significant. – Ron Beyer Apr 24 '15 at 17:44
@WaiHaLee: Good point. In release mode the story is *significantly* different. – recursive Apr 24 '15 at 17:54
2

Your benchmark is rather dubious. Always reading the same element. Array is faster than `List`, but in a meaningful tests the difference will be much smaller. – CodesInChaos Apr 24 '15 at 18:03
1

How does it look if you get *random* elements of the array/list? As @CodesInChaos points out, you're reading only the first element - some optimisation might be happening... – Wai Ha Lee Apr 24 '15 at 18:06
2

@CodesInChaos: The question specifies that the first element is always accessed, so the benchmark measures that. I don't see how that makes it dubious. If there's an optimization, then that would apply to the OP's use case as well. – recursive Apr 24 '15 at 18:13
OP can gain the benefit of a `List` and avoid executing `List`s indexer every iteration by storing a reference before the loop, i.e. `var first = list.First()`, then executing the loop on first. If the first element is expected to change, an event could be fired and listened that mutated `first` as needed. There's a few different ways to approach this. – jdphenix Apr 24 '15 at 18:17
1

All that it shows is that using a `List` will cost 2 seconds in overhead for every billion reads. This shouts irrelevant to me. – scrwtp Apr 24 '15 at 19:37
@scrwtp: Irrelevant? The question is asking about the performance of accessing the first element in a list a billion or more times. It's exactly what's being asked. – recursive Apr 24 '15 at 19:48
1

@recursive: What is being asked is whether using a `List` will "slow anything down". You give numbers for it, but I don't agree with your conclusion. Going for an array based on those numbers is a wildly premature optimization. Unless OP wants to access that element a billion times _per second_, a 2 seconds overhead for using a `List` is not a cost worth pondering upon. – scrwtp Apr 24 '15 at 20:10
1

@scrwtp: That's reasonable. I would agree, but it's not unreasonable to disagree depending on your use case. I included enough information to reproduce the test. People should make their own conclusions based on their use cases. – recursive Apr 24 '15 at 20:12
If it is relevant or not depends on the cost of the other operations he performs (see OP). If they are costlier, the access cost to the first element is less relevant for over all performance and he should spend time optimizing the other operations. (80-20 rule). – BitTickler Apr 24 '15 at 21:56
1

The release code is optimizing away bounds checking on the array since a constant 0 is being used. A less contrived example may not benefit from that optimization. For example, you could replace [0] with a [x] where x is initialized once to a random number between 0 and 99. Also, I don't know if it was your intention but the xor is being optimized out of both tests in release mode due to the fact that you aren't referencing total later. If you want to negate that effect use it in the WriteLines or something. – jaket Apr 24 '15 at 22:45
@jaket: I didn't realize that about the xor. But the constant 0 optimization is appropriate, because the OP asks specifically about accessing the first element. – recursive Apr 24 '15 at 22:49
1

But it depends on how you access the first element. If the index were a constant 0 then sure. But under most other circumstances - such as it coming in as a parameter even if the caller always passed 0 then it's going to get bounds checked. My point was that the differences between the two are a) the list requires has a call overhead plus a bounds check and b) the array has no call overhead and may or may not be bounds checked. It's rather difficult to predict when bounds checking might be optimized away so I prefer not to count on it. – jaket Apr 24 '15 at 22:58

Random access on .NET lists is slow, but what if I always reference the first element?

2 Answers2

Update!