10

The algorithm std::includes takes two sorted ranges and checks whether set2 is in set1 (i.e. if each element of set2 is included in set1)?

I wonder why eel.is/c++draft says that the complexity of this algorithm is at most 2·(N1+N2-1) comparisons?

The same is stated at:
1. cppreference
2. cplusplus

It seems to me that it should be only at most 2·N1 comparisons, with the worst case when max(set2) >= max(set1).

MrPisarik
  • 1,260
  • 1
  • 11
  • 21
  • There's a sample implementation on that page, how many comparisons does it do? – Barmar May 24 '18 at 17:17
  • It's possible that the standard specified this to give a little leeway to implementations. There might be a non-obvious algorithm which could potentially be faster, but could require those extra comparisons. – Justin May 24 '18 at 17:32
  • 1
    @Justin, nice assumption, it would be cool if someone could find this implementation, especially implemented in some compiler. – MrPisarik May 24 '18 at 17:48

3 Answers3

4

I agree with your conclusion. The inteleaved sets example from Aki Suihkonen's answer is wrong because the algorithm will exit early as soon as 2 < 3.

The sample implementation on cppreference has a loop which increments first1 on every iteration, returns when first1 == last1, performs at most 2 comparisons per iteration, and contains no nested loops. I don't see how this could do more than 2xN1 comparisons.

Matt Hellige
  • 120
  • 7
  • "I don't see how this could do more than 2xN1 comparisons." lets say N1 == 1 and N2 == 5 and sets are {5} and {1,2,3,4,5} how it is possible to do that in 2 comparisons? – Slava May 24 '18 at 17:29
  • Because in the first iteration, the algorithm compares `1<5`, determines that 1 is not included in {5} and returns false. The sample implementation only performs one comparison in that case. – Matt Hellige May 24 '18 at 17:30
  • 2
    What about S1={5} and S2={5,5,5,5,5}? I can't see how that could be done in 2 comparisons – Justin May 24 '18 at 17:36
  • 1
    Ah I assumed that it checks first in second not the way around. My bad – Slava May 24 '18 at 17:36
  • 1
    @Justin, in this case, algorithm will stop, after first comparison, because at the second iteration pointer of S1 will be increased and the algorithm will return false because of `first1 == last1`. Actually in each case when `N1 < N2` it will return false after `N1` iterations. – MrPisarik May 24 '18 at 17:44
  • @Justin, no, as far as I can see, the sample algorithm will simply return false in that case. The first range needs to contain the same number of copies as the second range in order for `std::includes` to return true. – Matt Hellige May 24 '18 at 17:45
  • 1
    @MattHellige Then the function is improperly specified in [the standard](http://eel.is/c++draft/alg.set.operations). It says we are evaluating `∀a∈S2, a∈S1` which can certainly be true even if `|S2| > |S1|`, since S1 and S2 are not sets, but ranges with potentially repeated elements – Justin May 24 '18 at 17:50
  • @Justin, I think, that it correctly specified, because it works for sets and multisets. So sets cannot contain duplicates. And multiset `S1={1}` doesn't include `S2={1,1,1}`, thus it works correctly. And standard says `std::includes' is an operation on sets – MrPisarik May 24 '18 at 18:41
  • @Justin I would tend to agree, but I also verified the behavior in gcc, and `std::includes` returns false for {5} and {5, 5, 5, 5, 5}. So either the actual implementation is also incorrect, or the specification of the function should probably be updated. This behavior was surprising to me as well! – Matt Hellige May 24 '18 at 19:08
4

I have created an issue on github of C++ standard draft. There is a little conversation on it with Richard Smith from ISO C++ Standards Committee.

From the start he refused the issue being confused about std::includes intention. But eventually agreed that complexity of function should be revisited after clarification it specification:

The complexity requirements are consistent with the current description, and should be fixed if/when the description is fixed to actually describe what the algorithm is "supposed' to do. Seems like LWG is already on the case. I'll reply to that lib thread to request that the complexity be revisited when the spec is fixed.

MrPisarik
  • 1,260
  • 1
  • 11
  • 21
3

For interleaved sets, eg 1,3,5,7..., 2,4,6,8,..., one must compare the first item of each set for equality, and when that fails, one has to consume the smaller item out of the sorted queue. The other way is comparing first a<b, then b<a, assuming that only less-than operator is available. Either way this leads to 2(N1+N2+c) complexity.

This complexity analysis can change with the introduction of threeway comparison <=> to (N1+N2-1).

EDIT: yes, you are right. The algorithm advances the first pointer in each iteration and stops when the first pointer/iterator reaches the end. Thus there will be at maximum of N iterations. This is independent of steps needed to advance the iterator2. The failure is in the example algorithm, which doesn't handle the cases of set1={1,2,3}, set2={3,3,3,X}, with repetitions.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • 3
    the algorithm have to stop, when it will compare 3 and 2, because the for-loop suggested at cppreference has loop invariant `first1 <= first2`, isn't it? – MrPisarik May 24 '18 at 17:26
  • it handles this case, because multiset1 doesn't contain multiset2, because the first has only one 3, when the second three 3s. And this operation considered in standard as operation on sets. – MrPisarik May 24 '18 at 18:43