Asymptotic complexity of std::remove_if

Question

I am working on an erase method for a data structure with a hard-coded maximum number of elements, N, that relies on std::array to avoid heap memory. Although the std::array contains N elements only some number, M, of them are "relevant" elements where M is less than or equal to N. As an example, if N is 10 and the array looks like this:

std::array<int, N> elements = { 0, 1, 2, -1, 4, -1, 6, -1, -1, 9 };

...and if M is 7, only the first 7 elements are "relevant" while the others are considered junk (the ending { -1, -1, -9 } are junk). I am using int here for a SO example but the real program stores objects that implement operator==. Below is a working example that removes all -1 and updates M:

#include <algorithm>
#include <array>
#include <iostream>

constexpr unsigned N = 10;
unsigned           M = 7;
std::array<int, N> elements = { 0, 1, 2, -1, 4, -1, 6, -1, -1, 9 };

int main() {
        for (unsigned i = 0; i < M; ++i)
                std::cout << elements[i] << ' ';
        std::cout << '\n';

        auto newEnd = std::remove_if(
                std::begin(elements), std::begin(elements) + M,
                [](const auto& element) {
                        return -1 == element;
                }
        );

        unsigned numDeleted = M - std::distance(std::begin(elements), newEnd);
        M -= numDeleted;
        std::cout << "Num deleted: " << numDeleted << '\n';

        for (unsigned i = 0; i < M; ++i)
                std::cout << elements[i] << ' ';
        std::cout << '\n';

        return 0;
}

The question I have is what is the asymptotic complexity of the std::remove_if? I would imagine that between the std::remove_if and std::distance it is overall O(2M) or O(M) where the std::remove_if is a more expensive operation. However I am not sure if the std::remove_if is O(N * M) due to shifting elements per deletion

Edit: For clarity, I understand that this should be applying the predicate M times but am wondering if N shifts are being applied each time the predicate is true

Anyway `std::begin(elements), &elements[M]` is implement specific behavior since C++ standard said that `std::begin(elements)` will return `elements.begin()`, which in turn return a `std::array::iterator` which is implement specific type, and `&elements[M]` is a pointer type — Danh, Aug 11 '16 at 06:18
@Danh, How could it be O(1)? If I had an array of size 100, 90 elements were "relevant", and I considered the 90 to be M, then `std::remove` is surly at least O(M) — asimes, Aug 11 '16 at 06:19
@Danh Complexity is certainly not `O(1)` here since it depends on `N` and `M`, making them constant does not change the fact that the complexity of the program depends on these two values. — Holt, Aug 11 '16 at 06:19
@Dahn, How can I specify `&elements[M]` otherwise? Open to suggestions — asimes, Aug 11 '16 at 06:20
@asimes `std::begin(elements) + M` should work and be standard. — Holt, Aug 11 '16 at 06:20
@Danh If you have *"Exactly `std::distance(first, last)` applications of the predicate."*, then it is definitively not constant since it depends on `last` and `first`. — Holt, Aug 11 '16 at 06:24

milleniumbug · Answer 1 · 2016-08-11T06:41:56.020

4

By cppreference:

Complexity: Exactly std::distance(first, last) applications of the predicate.

There are no shift operations on the removed elements because they can have unspecified value after the call to std::remove_if

edited Aug 11 '16 at 06:41

answered Aug 11 '16 at 06:02

milleniumbug

15,379
3
47
71

1

The quote specifies the number of applications of the predicate, whereas the question specifically asked about the number of shift operations. – Ami Tavory Aug 11 '16 at 06:09
Although it iterates between 0 and M, the overall data structure has a length of N. Perhaps I should emphasize it more in my question, but what I am wondering about is the possibility of N shifts (rather than M applications of the predicate) – asimes Aug 11 '16 at 06:09
@asimes `std::remove_if` does not care about the underlying array, the only thing it knows is that `std::distance(first, last) == M` so it is guarantees to runs in `O(M)`. If you have a vector with `.size() = M` and `.capacity() = N`, `std::remove_if` runs in `O(M)` not `O(N)`, it is the same idea here. – Holt Aug 11 '16 at 06:26
@holt It is very easy to build an implementation only aware of the distance being *m*, and running in quadratic time. – Ami Tavory Aug 11 '16 at 06:30
@AmiTavory Yes, but it will still run in quadratic time regarding `M` not `N`, whatever the complexity, the only parameter will be `M`. – Holt Aug 11 '16 at 06:33
This answer is better than mine. It should be the accepted one. – Ami Tavory Aug 11 '16 at 07:19

Ami Tavory · Accepted Answer · 2016-08-11T07:21:23.423

3

Edit

This answer, in retrospect, addresses a more complicated question than what was asked - how could a "push back to end" function be implemented in linear time. Regarding the specific question asked - pertaining to remove_if - @millenimumbug's answer addresses it better.

I can see why you'd think that the complexity would be Θ(m n), as each of the m removed items might need to be shifted Θ(n) distance.

It is actually possible to do this in time Θ(n) and additional O(1) space (just a few additional iterators).

Consider the following diagram, showing an iteration of a possible implementation of the algorithm.

The red items are a contiguous group of recognized items to be removed to the end, at this point (you just need two points to record this). The green item is the item now being considered (another pointer).

If the green item is to be removed, the red group simply becomes bigger by including it. This is shown in the next diagram, where the red group expands. In the next iteration, the green item will be the one to the right of it.

If not, all the red group needs to be shifted past it. Some thought can convince you that this can be done in linear time in the red group (even though the iterators are guaranteed to be only forward iterators).

Why is the complexity linear? Because you can imagine this as being equivalent to the green element being shifted left relative to the left group. The rationale is similar to that of amortized analysis.

The following diagram shows the second case. In the next iteration, the green element (being considered) will again be to the right of the red group.

edited Aug 11 '16 at 07:21

answered Aug 11 '16 at 06:23

Ami Tavory

74,578
11
141
185

Suppose that there were more than one red group, or to be really nasty, that every other element was red. I am having a hard time seeing how the shifting of all elements to the right is avoided – asimes Aug 11 '16 at 06:31
@asimes The red group shows only the *recognized* to-be-removed elements at this iteration. It is very possible that to the right of the green point, alternating elements are to be removed or not. Each one that is to be removed, conceptually will grow the red part. Each one that is not to be removed conceptually will be shifted to the left of it (since these are forward iterators, the reds will be actually shifted past it, but it's easier to see the complexity this way). – Ami Tavory Aug 11 '16 at 06:34
I think that maybe O(M) space is needed then, but this a good answer. I can see how this is possible to implement in O(M) time now – asimes Aug 11 '16 at 06:39
Size of the red group is irrevelant because I don't need these elements to have a specific value (IOW I can just overwrite them) - see the `std::remove_if` postconditions. So even more simple than your suggested implementation. – milleniumbug Aug 11 '16 at 06:39
@asimes Just look at "possible implementations" on [en.cppreference.com](http://en.cppreference.com/w/cpp/algorithm/remove) to have a `O(M)` implementation. Standard guarantees `O(M)` applications of the predicate, but does not guarantee the number of shift because these shift are moves, which mean they should not be expensive, but in all cases I doubt any compiler would give you anything else than `O(M)` for the whole `std::remove_if`. – Holt Aug 11 '16 at 06:43
@milleniumbug That is an excellent point you made. The guarantee is that elements removed are "dereferenceable, but the elements themselves have unspecified values". – Ami Tavory Aug 11 '16 at 07:14

Asymptotic complexity of std::remove_if

2 Answers2