How is it possible that "RemoveAll" in LINQ is much faster than iteration?

Question

The following code:

List<Interval> intervals = new List<Interval>();
List<int> points = new List<int>();

//Initialization of the two lists
// [...]

foreach (var point in points)
{
    intervals.RemoveAll (x => x.Intersects (point));
}

is at least 100x faster than this when the lists are of size ~10000:

List<Interval> intervals = new List<Interval>();
List<int> points = new List<int>();

//Initialization of the two lists
// [...]

foreach (var point in points)
{
    for (int i = 0; i < intervals.Count;)
    {
        if (intervals[i].Intersects(point))
        {
            intervals.Remove(intervals[i]);
        }
        else
        {
            i++;
        }
    }
}

How is it possible? What is performed under the hood with "RemoveAll"? According to MSDN, "RemoveAll" performs a linear search and is therefore in O(n). So I would expect similar performance for both.

When replacing "Remove" by "RemoveAt", the iteration is much faster, comparable to "RemoveAll". But both "Remove" and "RemoveAt" have O(n) complexity, so why is the performance difference between them so big? Could it only be due to the fact that "Remove (item)" compares the list elements with "item" and "RemoveAt" doesn't perform any comparison?

`RemoveAll` does not use LINQ, it is a standard method on `List`. This is noted by the fact that `RemoveAll` modifies the collection *in place* - LINQ does not modify collections. — Dan, Sep 02 '15 at 11:10
@Brainless, you can speed up 2nd code sample, if use `intervals.RemoveAt(i);` instead of `intervals.Remove (intervals[i]);`, I think. — ASh, Sep 02 '15 at 11:13
FWIW, a better approach might be to actually use LINQ and use `.Except` on the List and operate on the returned enumerable. This ensures you don't modify the original list and won't perform any processing until you actually enumerate the results. — Dan, Sep 02 '15 at 11:14
I replaced "Remove" by "RemoveAt" and the 2 speeds are now comparable. However "RemoveAll" is still sensibly faster. — Brainless, Sep 02 '15 at 11:19
Both `RemoveAll` and `Remove` are `O(n)`, so it's easy to believe the one which has an additional `for` loop will perform `n` times slower. — vgru, Sep 02 '15 at 11:19
Here's the LINQ way: `intervals.Where(i => !points.Any(p => i.Intersects(p))).ToList();`. Not more efficient but possibly more readable. — Tim Schmelter, Sep 02 '15 at 11:26
@Brainless RemoveAt doesn't perform any comparison, it simply removes the item at the specified position. Remove on the other hand has to *search* for the item that is equal to its argument. — Panagiotis Kanavos, Sep 02 '15 at 11:29
@TimSchmelter +1 It's always good to have generics manipulations in LINQ, but I'm not sure if it's more readable... We could argue over readability here — Brainless, Sep 02 '15 at 11:32
@Brainless: imo the best approach(in terms of readability and performance) is a combination of `RemoveAll` and LINQ: `intervals.RemoveAll(i => points.Any(p => i.Intersects(p)));` — Tim Schmelter, Sep 02 '15 at 11:49

Dennis_E · Accepted Answer · 2015-09-02T11:25:56.100

34

If you remove an item from a List<T>, all the items after it will be moved back one spot. So if you remove n items, a lot of items will be moved n times.
RemoveAll will only do the moving once, which you can see in the source for List<T>: source

Another thing is that Remove(T item) will search the entire List for the item, so that's another n operations.

Something that has nothing to do with your question, but I'd like to point out anyway:
If you use a for-loop to delete items from a List, it's a lot easier to start at the end:

for (int i = intervals.Count - 1; i >= 0; i--)
{
    if (intervals[i].Intersects(point))
    {
        intervals.RemoveAt(i);
    }
}

This way, you don't need that ugly else-clause

edited Sep 02 '15 at 11:25

answered Sep 02 '15 at 11:19

Dennis_E

8,751
23
29

Why decreasing iteration? – Brainless Sep 02 '15 at 11:27
1

@Brainless because if starting from the end, you don't have to "compensate" your `i` when removing, and get rid of the `else` clause, which makes for more readable code. By the way, your original loop is wrong... it'll move 2 slots forward when not removing (not one): the `for` statement will increase one, and your `else` will increase another – Jcl Sep 02 '15 at 11:29
1

@Brainless. If you remove let's say list[3], all the items at index 4, 5, etc will be moved back one spot. But that's territory were you've already been. It won't mess with items 0,1 and 2. – Dennis_E Sep 02 '15 at 11:30
1

@Brainless Because if you remove items, you don’t end up skipping the next element (since the index should stay the same instead). As you can see, when iterating backwards, you don’t need to do the `i++/i--` separately within the loop body. – poke Sep 02 '15 at 11:30
1

@Brainless because moving from bottom to top does not require skipping increase in counter for one spot if an item is removed.. – Kryptonian Sep 02 '15 at 11:30
@Jcl No, OP’s loop was correct but an edit incorrectly added a `i++` to the for construct. – poke Sep 02 '15 at 11:31
@poke I see that, sorry, missed the revision history :-) – Jcl Sep 02 '15 at 11:32
If you check the revision history, people have continuously added and removed and added and removed etc... the "++i" in the for loop... – Brainless Sep 02 '15 at 11:34
1

It also has the potential to be significantly faster. Consider the extreme case where all elements are removed from the list: if you start from the back, there's no "shifting" of the remaining elements, so it's O(n). If you start from the front, it's O(n²) because each removal requires shifting the remainder of the list – Mark Sowul Sep 02 '15 at 14:54
1

@MarkSowul Indeed. If you go forward, you're shifting items you're going to delete later anyway. It's just unnecessary work. – Dennis_E Sep 02 '15 at 15:12

Henrik · Answer 2 · 2015-09-02T12:05:20.100

12

RemoveAll can be done in O(n) by checking the condition for n elements and moving at most n elements.

Your loop is O(n^2), as each Remove needs to check up to n elements. And even if you change it to RemoveAt, it still needs to move up to n elements.

This might be the fastest solution: intervals.RemoveAll(x => points.Any(x.Intersects));

edited Sep 02 '15 at 12:05

answered Sep 02 '15 at 11:19

Henrik

23,186
6
42
92

score 3 · Answer 3 · answered Sep 02 '15 at 11:20

List is an array, and removing one element from an array requires moving all elements after the one you're removing to the previous index, so a[i] is moved to a[i-1].

Doing this repeatedly requires multiple moves, even if more elements meet the removal criteria. RemoveAll may optimize this by moving the elements by more than 1 index at a time as it traverses the list and finds more elements that match the removal criteria.

George Polevoy · Answer 4 · 2015-09-03T10:21:38.440

0

The difference is that Remove itself is an O(n), so you get O(n^2).

Replace for with a new collection and an assignment.

items = items.Where(i => ...).ToList();

This method has the same algorithmic time complexity as RemoveAll, but uses extra O(n) memory.

edited Sep 03 '15 at 10:21

answered Sep 02 '15 at 11:33

George Polevoy

7,450
3
36
61

1

I doubt RemoveAll would do that; that uses O(n) extra memory. – Mark Sowul Sep 02 '15 at 14:56
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,82567b42bbfc416e – Mark Sowul Sep 02 '15 at 14:57
Updated the answer to be more precise on that part. – George Polevoy Sep 03 '15 at 10:24

How is it possible that "RemoveAll" in LINQ is much faster than iteration?

4 Answers4