elegant way to remove all elements of a vector that are contained in another vector?

Question

While looking over some code I found loopy and algorithmically slow implementation of std::set_difference :

 for(int i = 0; i < a.size(); i++)
 {
  iter = std::find(b.begin(),b.end(),a[i]);
  if(iter != b.end())
  {
     b.erase(iter);
  }
 }

It can be easily replaced with sort(vectors are not sorted) + set_difference, but that requires allocation of new memory(see my recent Q Can output of set difference be stored in first input? why it cant be done "inplace").
So my solution would be something like:

sort(a.begin(), a.end());
for(size_t i = 0; i < b.size(); i++)
{
 if (binary_search(a.begin(), a.end(), b[i]))
 {
     swap(b[i], b[b.size()-1]); //remove current element by swapping with last
     b.pop_back();     // and removing new last by shrinking
 }
}

can it be done more elegantly?
elegant is subjective so within scope of this Q is defined as clearer code(ideally something from STL algorithms but I think it cant be done) but with no memory allocation and no increase in alg complexity.

Define "elegance." Less code? Less execution time? Less memory? — John Dibling, Jan 17 '14 at 20:27
@NoSenseEtAl In the second example, is `b` one of the inputs to `set_difference`? If so, you're modifying the range that has been passed to you, which the caller may not be thrilled about. Of course, you could make a local copy, but now that solution is starting to look less attractive. — Praetorian, Jan 17 '14 at 20:35
@pretorian it is inclass member in real code(and order is not imporant), so not really a big problem, but yes in generic case this is evil, but still it is aster to allocate M elems and do N log M operations than to allocate 0 and do N*M operations — NoSenseEtAl, Jan 17 '14 at 20:39
@NoSenseEtAl - maybe it will be faster to just make a list of all the "hits" references, then copy the data to the target vector from that list sequentially and only do a single resize of the source vector instead of for each hit and restore the vector using the hits list to skip the removed values. I think the allocation and reallocation is the bottleneck here, so it might be a good idea to minimize it or even avoid it, after all you are shrinking so as long as you can restore the correct order you don't need to reallocate unless maybe trim in the end. — dtech, Jan 17 '14 at 21:08
@ddriver resizes are shrinks, so they dont cause reallocations — NoSenseEtAl, Jan 18 '14 at 01:05

n. m. could be an AI · Accepted Answer · 2014-01-17T21:53:42.040

10

This one does it in O(N+M), assuming both arrays are sorted.

  auto ib = std::begin(two);
  auto iter = std::remove_if (
       std::begin(one), std::end(one),
       [&ib](int x) -> bool {
                       while  (ib != std::end(two) && *ib < x) ++ib;
                       return (ib != std::end(two) && *ib == x);
                     });

edited Jan 17 '14 at 21:53

answered Jan 17 '14 at 21:09

n. m. could be an AI

112,515
14
128
243

I think this requires the enhanced return type deduction of C++1y. – dyp Jan 17 '14 at 21:17
@dyp why? Can't see it (and it works in both gcc and clang with -std=c++11). – n. m. could be an AI Jan 17 '14 at 21:29
IIRC both do implement the return type deduction of C++1y as an extension in their C++11 modes. However, [expr.prim.lambda]/4 states that your lambda should have return type `void`. – dyp Jan 17 '14 at 21:33
i must say i usually hate code that mixes looping inside lambda that is funtion for stl algorithm + capture by reference but your code is quite pretty. :) In my case arrays arent sorted, but change is trivial, just sort both of them... again nice code. :) – NoSenseEtAl Jan 18 '14 at 01:09
1

In my opinion it will not compile, need to add [&ib, &two] – K P Jun 29 '20 at 07:24

Ralph Tandetzky · Answer 2 · 2014-01-18T00:55:57.247

9

Sort b so you can binary search it in order to reduce time complexity. Then use the erase-remove idiom in order to throw away all elements from a that are contained in b:

sort( begin(b), end(b) );
a.erase( remove_if( begin(a),end(a),
    [&](auto x){return binary_search(begin(b),end(b),x);}), end(a) );

Of course, you can still sacrifice time complexity for simplicity and reduce your code by removing the sort() and replacing binary_search() by find():

a.erase( remove_if( begin(a),end(a),
    [&](auto x){return find(begin(b),end(b),x)!=end(b);}), end(a) );

This is a matter of taste. In both cases you don't need heap allocations. By the way, I'm using lambda auto parameters which are C++14. Some compilers already implement that feature such as clang. If you don't have such a compiler, but only C++11 then replace auto by the element type of the container.

By the way, this code does not mention any types! You can write a template function so it works for all kind of types. The first variant requires random access iteration of b while the second piece of code does not require that.

edited Jan 18 '14 at 00:55

answered Jan 18 '14 at 00:13

Ralph Tandetzky

22,780
11
73
120

i would prefer &b in that capture list, but that is my personal pref. +1 – NoSenseEtAl Jan 20 '14 at 10:31
1

@NoSenseEtAl For maintenance it most often best practice to write just `[&]` or `[=]` because you don't have to change that part, if other code changes. Explicit captures make a lot of sense if there's a particular reason that using other variables should not be allowed and hence should result in helpful compile time errors. The case that you specify a default capture and add some variables to be captured differently is also an important use case. – Ralph Tandetzky Jan 20 '14 at 22:05
In terms of time complexity, is this the most optimal way? If `a` has many elements and `b` has few elements, isn't it better to sort a copy of `a` and binary search it? (tag the indices of the elements of the copied `a` to each element before sorting). Just my brief thought, so any advice would be helpful! – starriet Jul 07 '23 at 01:50

score 4 · Answer 3 · answered Jan 17 '14 at 20:33

4

One solution that comes to mind is combining remove_if and binary_search. It's effectively the same as your manual looping solution but might be a bit more "elegant" as it uses more STL features.

sort(begin(b), end(b));
auto iter = remove_if(begin(a), end(a), 
                      [](auto x) { 
                          return binary_search(begin(b), end(b), x); 
                      });
// Now [begin(a), iter) defines a new range, and you can erase them however
// you see fit, based on the type of a.

answered Jan 17 '14 at 20:33

bstamour

7,746
1
26
39

erase remove strikes again... i hate it from readability point of view, but still a nice A – NoSenseEtAl Jan 17 '14 at 20:38
I agree, but considering that different containers may have different specializations for removing elements, it's a fair trade-off, imo. – bstamour Jan 17 '14 at 20:39
1

The `remove_if` and `binary_search` are operating over different ranges. – bstamour Jan 17 '14 at 20:40
@Praetorian remove if is touching a, not b, if i understand it correctly – NoSenseEtAl Jan 17 '14 at 20:41
1

You understand correctly. `binary_search` only looks at the elements: it is a non-modifying algorithm, and `remove_if` never touches b at all. – bstamour Jan 17 '14 at 20:42

Adrian McCarthy · Answer 4 · 2014-01-17T21:45:03.790

The current code is quite clear, in that it should be obvious to any programmer what's going on.

The current performance is O(a.size() * b.size()), which may be pretty bad depending upon the actual sizes.

A more concise and STL-like way to describe it is to use remove_if with a predicate that tells you if a value in in a.

b.erase(std::remove_if(b.begin(), b.end(), [](const auto&x) {
  return std::find(a.begin(), a.end(), x) != a.end();
}), b.end());

(Not tested, so I might have made a syntax error.) I used a lambda, but you can create a functor if you're not using a C++11 compiler.

Note that the original code removes just one instance of a value in b that's also in a. My solution will remove all instances of such a value from b.

Note that the find operation happens again and again, so it's probably better to do that on the smaller vector for better locality of reference.

NoSenseEtAl · Answer 5 · 2014-01-20T15:06:33.987

after thinking for a while I thought of this
(note:by answering my own Q im not claiming this is a superior to offered A):

vector<int64_t> a{3,2,7,5,11,13}, b{2,3,13,5};
set<int64_t> bs(b.begin(), b.end());
for (const auto& num: bs)
    cout << num << " ";
cout  << endl;
for (const auto& num: a)
    bs.erase(num);
vector<int64_t> result(bs.begin(), bs.end());
for (const auto& num: result)
    cout << num << " ";

elegant way to remove all elements of a vector that are contained in another vector?

5 Answers5

Linked