std::set operations with custom comparator

Question

I have a question. When I'm using a std::set with custom comparator, other operations like erase or count doesn't work properly. For example:

int sz(int const & n) {
  return __builtin_popcount(n);
}

struct comp {
  bool operator()(int const & a, int const & b) const {
    return sz(a) >= sz(b);
  }
};

void solve() {
  set<int, comp> s;

  for (int i = 0; i < 10; ++i)
    s.insert(i);

  for (int x : s)
    cerr << x << " ";

  cerr << "\n";

  for (int i = 0; i < 10; ++i)
    cerr << s.count(i) << " ";
}

Output will be:

7 9 6 5 3 8 4 2 1 0
0 0 0 0 0 0 0 0 0 0

How I might use std::set with custom comparator, that all operations work properly? Thanks in advance.

Your comparator doesn't have a strict weak ordering. `cmp(a, b)` implies `!cmp(b, a)`, but your comparator returns true for both if `sz(a) == sz(b)`. Or, even simpler, `cmp(a, a)` must return false. — chris, Dec 17 '17 at 22:20
I try to insert elements in increasing order, for that i'm using `!<`, e.g `>=` Or comparator for std::set don't need to work like `operator<()` ? — Nikita Voloshin, Dec 17 '17 at 22:23
@NikitaVoloshin Why do you think that? Did you read about requirements for the [compare function](http://en.cppreference.com/w/cpp/concept/Compare)? — Algirdas Preidžius, Dec 17 '17 at 22:26
It does need to work like `operator<` in that the comparison needs to establish a strict weak ordering. If you're unfamiliar with the term, you can see the [documentation](http://en.cppreference.com/w/cpp/concept/Compare) attached to `std::set`'s comparator. Note that `operator>=` returning inverted results from `operator<` does not translate into a reverse ordering. That ignores the special case where elements have an equal number of bits set. The inverted result of `operator<` is not what you want in that case; you still want that to return false. — chris, Dec 17 '17 at 22:29
@Algirdas Preidžius Thank you for your answer, yes, I didn't read about requirements for the comparator, my bad. And as I understand, there is no way to store elements in increasing order in way I want, yes? — Nikita Voloshin, Dec 17 '17 at 22:39
@chris Thank you very much, it's very helpful. But is there a way to do what I want? Store all disticnt numbers in increasing order of bitcount? — Nikita Voloshin, Dec 17 '17 at 22:40
@NikitaVoloshin Where did I say that it is not possible? Any sorting mechanism is possible as long as it adheres to the requirements. — Algirdas Preidžius, Dec 17 '17 at 22:42
@NikitaVoloshin -- Just to let you know, some compilers, one being Visual C++, checks for strict-weak-ordering violations like this in a sneaky way. These compilers call your `<` operator twice. Once with (a,b) and another with (b, a). If you return the same true / false value with the only difference being the arguments switched, the runtime throws an assertion that you've violated the strict-weak-order requirement. In other words, the compiler asserts since you said `a < b` and at the same time `b < a`. That is impossible, thus an assertion. — PaulMcKenzie, Dec 17 '17 at 23:08
@PaulMcKenzie thank you, I didn't know about it. But I'm using gcc, and there isn't any assertion errors, so I think that there is no such a test. — Nikita Voloshin, Dec 17 '17 at 23:12
@NikitaVoloshin -- I believe that some version of `gcc` is able to have a similar test for this condition, maybe by a compiler flag. Maybe it is `clang`, don't know, but I vaguely remember that at least one of those compilers had such a built-in test, but needed to be invoked via compiler option. — PaulMcKenzie, Dec 18 '17 at 20:10

max66 · Accepted Answer · 2017-12-17T22:43:53.020

8

Try changing

struct comp {
  bool operator()(int const & a, int const & b) const {
    return sz(a) >= sz(b);
  }
};

with

struct comp {
  bool operator()(int const & a, int const & b) const {
    return sz(a) > sz(b);
  }  // ---------^ 
};

The (first) problem is that a comparator must impose a strict weak ordering.

So, in particular, must be comp(a, a) == false for every a in the std::set.

With your comparator you have comp(a, a) == true for every a.

Anyway: this works only if a != b imply s(a) != s(b); if this isn't the case... well... I suppose you can try with

struct comp {
  bool operator()(int const & a, int const & b) const {
    return (sz(a) > sz(b)) || ((sz(a) == sz(b)) && (a > b));
  }
};

or something similar.

edited Dec 17 '17 at 22:43

answered Dec 17 '17 at 22:19

max66

65,235
10
71
111

Yes, it fixes a problem with count, but I need to insert all elements in range `0..9`, but after this change in set will be only elements with different `sz()`. Is this a truth, that comparator should return `<0` if `a` less then `b`, `0` if equals and `>0` otherwise? – Nikita Voloshin Dec 17 '17 at 22:28
1

@NikitaVoloshin - no; you describing the behaviour of `strcmp()`; a comparator must return a `bool`; so `true` or `false`. – max66 Dec 17 '17 at 22:40

score 4 · Answer 2 · answered Dec 17 '17 at 22:49

More on the theory side of things:

Per the documented requirements for std::set's comparator (and every other "less-than comparator" in the standard library), it needs to establish a strict weak ordering:

For all a, comp(a,a) == false
If comp(a,b) == true then comp(b,a) == false
If comp(a,b) == true and comp(b,c) == true then comp(a,c) == true

To keep it short, I've left out the transitivity of incomparability requirement, which is handled by the equiv expressions in the cppreference documentation, but note that the above three aren't quite enough.

You can think of the comparison as asking "Must a come before b?" The implementation assumes this is the question the comparison is asking and the answer for equal elements is no, one must not appear before the other. Your comparator fails the first two tests:

comp(0,0) returns true
comp(1,2) returns true, but comp(2,1) returns false

This isn't arbitrary. For simplicity, imagine a naive sorted array. You have 3 1 and want to insert 2. Starting at the beginning, you check comp(2,1). It returns true because the two have the same number of bits, so you're done and now you have 2 3 1. Evidently, this is incorrect. That isn't to say std::set is the same as a sorted array, but it needs something to go on when figuring out where to put and find elements. A strict weak ordering keeps this process consistent.

What you're really after for a descending popcount ordering is a strictly greater-than comparison. Therefore, the change is trivial:

return sz(a) > sz(b);

It solves problem with `count` operation, but I want to store all distinct numbers and sort them by popcount in increasing order. Unfortunatelly your solution doesn't allow to store numbers with equal popcount. But thank you, it's very informatively and I try to avoid bugs like this in future. — Nikita Voloshin, Dec 17 '17 at 22:54
@NikitaVoloshin, Fair, `std::set` doesn't support splitting apart the uniqueness tests and the ordering tests. If you want those to be separate, you could look for a non-standard container with both customization points or do something like call `std::unique` (via the erase-remove idiom) to remove duplicates after the fact. It's extra work at runtime, but could suit your needs. — chris, Dec 17 '17 at 23:02
There is answer above, but yes, it solves only my specific problem, but not applicable in common. Thank you for answer. — Nikita Voloshin, Dec 17 '17 at 23:06

score 1 · Answer 3 · answered Dec 17 '17 at 22:33

According the cppreference.com:

A binary predicate that takes two arguments of the same type as the elements and returns a bool. The expression comp(a,b), where comp is an object of this type and a and b are key values, shall return true if a is considered to go before b in the strict weak ordering the function defines.

Your comparator does not do this.

std::set operations with custom comparator

3 Answers3