2

Let's say I have a list of N elements (list1).

I have then a list of M <= N elements containing all the indices of elements that I want to remove from the other list (list2).

I want to delete elements on list1 accordingly with list2 indices in the most efficient way possible.

First approach was through a for loop:

for (int i = 0; i < list2.Count; i++)
    list1.RemoveAt(list2[i]);

This work pretty good (I need list2 to be ordered, of course) but has an O(M*N) complexity in worst case (M iterations, O(n) for RemoveAt)

Another solution was to create a temporary list, populate it with elements that should not be deleted and then use the LINQ intersect method

List<T> tempList = new List<T>();
for (int i = 0; i < list1.Count; i++)
    if (list2.Contains(i))
        tempList.Add(list1[i]);
list1.Intersect(tempList);

While at first I was excited by the O(N+M) complexity of the Intersect, I eventually realized that I need at first N iterations to populate the tempList and then any advantages goes lost (let us assume that list2 is an HashSet in this second case I don't care about ordering but just about O(1) Contains)

After some digging, still I wasn't able to find a way to perform a RemoveAll like method that removes all the elements from list1 accordigly to values stored in list2.

Is it there any chance to get it as performant as possible?

PS: for all the ones that will think "premature optimization is the root of all evil", please consider that my code is actually working fine but, as I am working on a strictly time dependend problem, saving few ns each iteration (and I am going to have around 150k iterations) can lead to a significant improvement.

EDIT: as @InBetween correctly pointed out, having an Intersect on second solution is actually useless, reducing it the complexity goes down.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Fylax
  • 1,138
  • 2
  • 15
  • 24
  • Optimization is the root of fun, IMHO. Premature or not! So can we assume `list2` is already sorted? – SimpleVar Dec 09 '16 at 19:05
  • We can assume a lot of things about `list2`. We can assume it is sorted, or it is an `HashSet`. I have control over `list2` and how it is created, so I can do a lot of interesting things over it. But I agree with you, optimization is cool! – Fylax Dec 09 '16 at 19:10
  • Why are you intersecting? Keep the newly created list. – InBetween Dec 09 '16 at 19:11
  • @InBetween Because at first it was an except, than an intesection, and then I lost track of what I was doing, but you are right (still, this doesn't seem the best way to do that to me) – Fylax Dec 09 '16 at 19:13
  • The first approach with a `LinkedList` instead of a `List` is `O(n)` since the remove operation becomes O(1) – SimpleVar Dec 09 '16 at 19:14
  • @SimpleVar I am aware of this, but I have (somewhere in the code) a lot of random accessed (more that deletions) that actually make `List` a better choice over `LinkedList` – Fylax Dec 09 '16 at 19:18
  • If `list2` is ordered, you can make it pretty fast modifying your second solution – InBetween Dec 09 '16 at 19:23

4 Answers4

2

If list2 is ordered, then simply use your second solution optimized:

var exceptIndex = 0;
var newList = new List<T>();

for (var i = 0; i < list1.Length; i++)
{
    if (i != list2[exceptIndex]) newList.Add(list1[i]);
    else exceptIndex++
}

return newList;
InBetween
  • 32,319
  • 3
  • 50
  • 90
  • Thanks. In the end I realized I way that merges your solutions to a new one: having list2 as a boolean array which says which indices should be kept and which not. This goes far away from my initial question, though. – Fylax Dec 09 '16 at 19:53
1
List1 = Enumerable.Range(0,List1.Count).Except(List2).Select(i=>List1[i]).ToList()

or

bool[] shouldWeSayGoodbye = new bool[List1.Count];
for(var i=0;i<List2.Count;i++){
   shouldWeSayGoodbye[List2[i]]=true;
}

typeof_list1 List3 = new List<typeof_list1>();
for(var i=0;i<List1.Count;i++){
   if(!shouldWeSayGoodbye[i]){
      List3.Add(List1[i])
   }
}

some testing shows the loop beats Linq x4 times at least.

Alexander Taran
  • 6,655
  • 2
  • 39
  • 60
  • Very cool answer, quite obscure. Do you mind explaining why is it performant? – Fylax Dec 09 '16 at 19:12
  • Because deleting an element forces list to move every element after it. And producing a new list is way faster. http://stackoverflow.com/questions/14082042/how-to-remove-from-listt-efficiently-c – Alexander Taran Dec 09 '16 at 19:17
  • 1
    @Fylax This is basically the LINQ version of your second solution removing the unnecessary `Except`. – InBetween Dec 09 '16 at 19:19
  • @InBetween I figured out what is this (and you already pointed out that there was an exceeding `Intersect` that actually had no sense and I agreed with you), but still it is not clear to me why it is more performant. As I explained in my question, saving ns is a huge improvement, and in my experience LINQ tends to be slower than a for loop – Fylax Dec 09 '16 at 19:22
  • Well then keep the basic concept - you are to find the indexes of the items to stay and produce a new list of them. Removing items from the list will be slower. – Alexander Taran Dec 09 '16 at 19:24
  • 1
    After your edit there is no reason to write a new answer. My idea was just to change list2 structure so that it is what you called `shallWeSayGoodby` from the start. Seems to be the most efficient and clean way to perform this task – Fylax Dec 09 '16 at 20:05
1

List<T>.Add is actually O(1) if the the capacity is not exceeded. Here's what I came up with, should go at O(n):

List<T> resultList = new List<T>(list1.Count); // high capacity!
int curIdx = list1.Count - 1; // start at the end of list1

// assumes list2 is sorted descendingly
list2.Add(-1); // add a final -1 index to make following code nicer

foreach (int targetIdx in list2)
{
    while (curIdx > targetIdx)
    {
        resultList.Add(list1[curIdx]); // both operations are O(1)
        curIdx--;
    }
    curIdx--;
}

// resultList is reversed

Clarification:

list1: [10, 11, 12, 13, 14]
list2: [1, 3]
wanted result: [10, 12, 14]

We want list2 to be sorted desc, and add a final -1 to it:

list2: [3, 1, -1]

And the final result would actually be the wanted result in reverse. Reversing the list afterwards can be done in O(n) so it does not change overall complexity, however you can optimize the code further so the final list is actually in correct order (home work!)

SimpleVar
  • 14,044
  • 4
  • 38
  • 60
  • Even though you see a while within a foreach - we really do only loop over n elements – SimpleVar Dec 09 '16 at 19:26
  • Using a backward for loop would made my final list in correct order, I guess. But final order is not actually a big deal, as I said before, I do a lot of random access (be aware of random), so it would just increase the entropy. Plus, I am aware of the capacity costraint. I tried all possible tricks all over my code to increase performances :) Anyway thank to you all, I found a quite nice and efficient solution (which massively changes list2 though) – Fylax Dec 09 '16 at 19:46
  • @Fylax, care to share your solution ? – Alexander Taran Dec 09 '16 at 19:58
  • Sure, it was in my plan. – Fylax Dec 09 '16 at 19:59
0

Your best bet might be to stick with your for loops. They are going to be the fastest, assuming the list2 is sorted in reverse order (for your example).

You could try using RemoveAll perhaps...

list1.RemoveAll(l1item => list2.Contains(list1.IndexOf(l1item)));

The downside is, while this looks cleaner and more straightforward, the underpinnings of it are actually fairly complex.

public int RemoveAll(Predicate<T> match)
{
    if (match == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
    }
    int num = 0;
    while (num < this._size && !match(this._items[num]))
    {
        num++;
    }
    if (num >= this._size)
    {
        return 0;
    }
    int i = num + 1;
    while (i < this._size)
    {
        while (i < this._size && match(this._items[i]))
        {
            i++;
        }
        if (i < this._size)
        {
            this._items[num++] = this._items[i++];
        }
    }
    Array.Clear(this._items, num, this._size - num);
    int result = this._size - num;
    this._size = num;
    this._version++;
    return result;
}

On the other hand, looking at RemoveAt which you are currently using:

public void RemoveAt(int index)
{
    if (index >= this._size)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException();
    }
    this._size--;
    if (index < this._size)
    {
        Array.Copy(this._items, index + 1, this._items, index, this._size - index);
    }
    this._items[this._size] = default(T);
    this._version++;
}

This performs an Array.Copy each time an item is removed while the RemoveAll works through the items and shifts indexed items up over the "removed" item.

You could run some simple benchmark tests on both of those and see which is better.

Using the following though, might give you the best results:

for (int i = 0; i < list1.Count; i++)
{
   bool exists = false;
   for (int j = 0; j < list2.Count; j++)
      if (i == list2[j]) 
      {
         exists = true;
         break;
      }
   if (!exists) newList.Add(list1[i]);
}

This should not have any dependency on list order.

gmiley
  • 6,531
  • 1
  • 13
  • 25