2

At the moment I'm working on a project that contains a fair amount of legacy code which includes the use of non-generic collections such as .NET's ArrayList, HashTable, etc.

I know that using these types of collections for primitive types is a terrible idea performance-wise as mentioned by List's documentation in the "Performance considerations" section (and which I confirmed for myself again with a quick & naive LinqPad query - attached at the end).

At first glance there doesn't seem to be any problem doing a sort of search/replace operation to replace these old collections. But since it will affect a large portion of the codebase I'm worried that there will be side-effects where List<T> doesn't behave as "expected", given ArrayList's specific behaviour which the applications already rely on.

Has anyone done this type of conversion on a large scale before? If yes were there subtle problems not mentioned in the .NET documentation?


void Main()
{
    var size = 1000000;
    var array = new int[size];
    var list = new List<int>();
    var arrayList = new ArrayList();

    Console.WriteLine("Testing " + size + " insertions...");
    Console.WriteLine();
    var stopwatch = Stopwatch.StartNew();

    for (var i = 0; i < size; i++)
    {
        array[i] = i;
    }
    stopwatch.Stop();
    Console.WriteLine("int[]: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
    stopwatch.Restart();

    for (var i = 0; i < size; i++)
    {
        list.Add(i);
    }
    stopwatch.Stop();
    Console.WriteLine("List<int>: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
    stopwatch.Restart();

    for (var i = 0; i < size; i++)
    {
        arrayList.Add(i);
    }
    stopwatch.Stop();
    Console.WriteLine("ArrayList: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
}

Output on my machine:

Testing 1000000 insertions...

int[]: 3,1063ms
List<int>: 7,2291ms
ArrayList: 111,5214ms

Multiple runs almost always show ArrayList an order of magnitude slower than either int[] or List<int>.

easuter
  • 1,167
  • 14
  • 20
  • 2
    That's because `ArrayList` boxes the value types. – SLaks Feb 24 '15 at 14:56
  • @SLaks, I know that. EDIT: my question isn't "why are ArrayLists slow"; I already know they are and want to leave them behind. I just don't want to break the existing codebase which is quite large. – easuter Feb 24 '15 at 14:59

2 Answers2

1

Early on, one of my jobs was to swap out ArrayLists for generic counter parts. My advice for not breaking a large code base: don't do a search/replace.

Only "upgrade" when:

  1. You can see the entire scope of the ArrayList and everything that "touches" it.
  2. There will be an actual performance increase.

ArrayList performance is much different for lengths smaller than 1000000. In theory, yes, an ArrayList is terrible. But in practice, if 95% of the ArrayLists are length < 100, maybe even < 1000, your application will see no significant performance increase, and you would have risked destabilizing your code base by swapping out these incidental ArrayLists.

Armed with the knowledge that ArrayLists are terribly slow as n approaches 1000000, I suggest going on a hunt for the 5% of ArrayLists where n approaches that "slowness" limit, and work on swapping them out. And swap out ONLY if condition 1 is also satisfied. In my experience, it is simply not worth the ms or even seconds of performance increase if 6 months from now your application starts experience bizarre crashes because you didn't realize something touched that ArrayList and required it to be an ArrayList.

Sully
  • 1,313
  • 10
  • 14
  • Thanks for the reply! I see your point about the size of the `ArrayList` being a factor. In this case I'm worried about a different scenario: the project is ramping up for load-test deployments much larger than what has been attempted previously; having a 1,000,000-element `ArrayList` will be just as bad a as 10,000 x 100-element `ArrayList`?... – easuter Feb 24 '15 at 18:51
  • Either way I need to gather more data! – easuter Feb 24 '15 at 18:52
1

Do note that ArrayList is only significantly slower for value types. If you have an ArrayList of strings, for example, the difference will not be very big - in performance. List<string> is more type-safe, of course.

There's some differences you'll find in compile-time. For example:

var list = new ArrayList();
list.Add(3);
list[0].ToString(); // Works fine

var list = new List<string>();
list.Add(3); // Fails

However, those should be easy to fix and find during compilation. List<T> isn't going to produce runtime conditions that ArrayList didn't. There's tons of issues from going the other way around, but List<T> is more strict than ArrayList - during compilation.

Most of your trouble will be with places where your ArrayList contains types that don't have a common ancestor. Replacing ArrayList with List<object> isn't really going to help you much. In some cases, this might be a legitimate use of ArrayList. A subset of this is methods that accept ArrayList and do something generic with this. Depending on the usage, you will either have to separate the code, or make the methods generic as well.

If you have types that derive from ArrayList, changing it to List<T> might give you some dubious method overloads - many of these will only produce a warning, so if you're in this scenario, pay attention to warnings, not just errors.

How are you planning to do a global search & replace? Are you manually going to go through every appearance of ArrayList and convert it to the given specific List<T>? That should be pretty safe.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • Thanks for the reply! I was thinking exactly of doing a manual search/replace. The most critical sections of the code (server-side) seem to have ~150 `ArrayList` allocations. – easuter Feb 24 '15 at 18:46