4

I had a simple requirement where I need to find the occurance of strings in one vector from a master list of strings in another vector. I was able to do it easily at first with:

vector<string> custom_list;
set<string> master_list;
vector<string> target_list;

std::sort(custom_list.begin(), custom_list.end());
std::set_intersection(custom_list.begin(), custom_list.end(), master_list.begin(),
                      master_list.end(), back_inserter(target_list));

This worked just fine. But later it turned out that each string in the master_list was associated with an identifier. I was hoping I could use std::set_intersection in such a way that I can use the intersected elements in target_list as an index to get at their identifiers. In effect I thought I'd change master_list to a map, like so:

map<string, SomeCustomId> master_list;

and be able to do something like:

auto I_want_this_id = master_list[target_list[0]);    

But now I am not sure if I can use set_intersection to compare two completely different containers (custom_list, a vector and master_list, a map) even if I write my own comparison function. Something like:

struct mycomparer {
    bool operator()(string const& lhs, pair<string, SomeCustomId> const& rhs) {
        return lhs == rhs.first;
    }
};

This doesn't quite work (I got a variety of compiler errors) and intuitively too, something about that felt wrong to me.

Is there a better way to accomplish what I am trying to do?

Daniel Heilper
  • 1,182
  • 2
  • 17
  • 34
ForeverLearning
  • 6,352
  • 3
  • 27
  • 33

2 Answers2

4

std::set_intersection expects a comparer that returns true if lhs < rhs, not if lhs == rhs. It also has to be able to compare its two arguments regardless of order (after all, determining if arguments are equivalent is done by (!comp(a, b) && !comp(b, a))).

Thus, you'd want something like

struct mycomparer {
    bool operator()(string const& lhs, pair<string const, SomeCustomId> const& rhs) {
        return lhs < rhs.first;
    }
    bool operator()(pair<string const, SomeCustomId> const& lhs, string const& rhs) {
        return lhs.first < rhs;
    }
};

Demo.

Edit: Updated demo code to include all the necessary headers. (<iterator> and <string> were missing. They were probably included by other headers in GCC, but not in VC++.)

VC++ 2012, when doing a debug build, appears to run some extra tests on the supplied predicate. This causes compilation to fail with errors like error C2664: 'bool mycomparer::operator ()(const std::pair<_Ty1,_Ty2> &,const std::string &)' : cannot convert parameter 1 from 'std::basic_string<_Elem,_Traits,_Alloc>' to 'const std::pair<_Ty1,_Ty2> &'. (It compiles fine for me on release build, once the headers are fixed and I switched to the old initialization style.)

To fix this, supply overloads of operator () taking all four possible combinations of parameters:

struct mycomparer {
    bool operator()(string const& lhs, pair<string const, SomeCustomId> const& rhs) {
        return lhs < rhs.first;
    }
    bool operator()(pair<string const, SomeCustomId> const& lhs, string const& rhs) {
        return lhs.first < rhs;
    }
    bool operator()(string const& lhs, string const& rhs) {
        return lhs < rhs;
    }
    bool operator()(pair<string const, SomeCustomId> const& lhs,
                    pair<string const, SomeCustomId> const& rhs) {
        return lhs.first < rhs.first;
    }
};

Edit 2: If you can use Boost.Range, it is much easier. Simply:

boost::set_intersection(custom_list, 
                        master_list | boost::adaptors::map_keys,
                        back_inserter(target_list));

No custom predicates required, and also very readable. Demo.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • I shouldn't have forgotten strict weak ordering. Thanks for reminding! I now understand my mistake. – ForeverLearning May 24 '14 at 17:14
  • Update: There seems to be a bug in VC++ 12. I couldn't get that code to compile on Friday before the long weekend. I rechecked today and I seem to have it right. Still doesn't compile. The template errors are long and gnarly. I will post it shortly. – ForeverLearning May 27 '14 at 15:12
  • @Dilip Does the edit fix your problem? I only have VS2012 (VC11), not 2013 (VC12), on this computer, but hopefully things are the same with 2013 – T.C. May 27 '14 at 15:58
  • Yes! The edit _does_ fix the problem! However, I have been informed that set_* is woefully underspecified. It should ideally not allow sequences with different element_types. Think of what would happen if I had to do set_union. – ForeverLearning May 27 '14 at 17:00
  • @Dilip The compiler would complain unless you have a custom-written output iterator that can take both types? :) But I agree, it makes more sense to require them to have same element type. The [Boost.Range](http://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/algorithms/set/set_intersection.html) version does this already. – T.C. May 27 '14 at 18:59
  • @Dilip See new edit for a much cleaner way to do this with Boost.Range. – T.C. May 28 '14 at 07:42
1

Algorithms don't really care about containers. They care about iterators, and as long as both container types satisfy the algorithm's iterator requirements, and your element types match the comparator, compatibility shouldn't be an issue.

So, fundamentally, what you're doing is okay.

You need to correct the logic in your comparator, though; operator() is supposed to implement a less-than predicate. And, as T.C. points out, you'll need to implement the reverse comparison explicitly, because the element types are not implicitly convertible to each other.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Isn't equivalence kinda inefficient in that it has to call the < operator *twice*? I.e when implementing operator== is it not sufficient to ensure that it operates *as if* (!(a < b) && !(b < a)) instead of actually writing it that way? – ForeverLearning May 24 '14 at 17:17
  • @Dilip: I'm sure there is a good SO question somewhere on why standard algorithms and containers use lesser-than instead of equals-to for determining equivalence. – Lightness Races in Orbit May 24 '14 at 17:50
  • @Dilip: Given two ranges, how do you determine their intersection in linear time if you can only compare elements for equality? – T.C. May 24 '14 at 17:53
  • @T.C. Both input ranges are required to be sorted and, given that, you only need to iterate either of them once. – Lightness Races in Orbit May 24 '14 at 19:26
  • @LightnessRacesinOrbit I don't think you can do that with only `==` and no `<`... – T.C. May 24 '14 at 20:19
  • @T.C. No, but you can compare for equality with two `<`s, as the standard library does. It may be twice as many comparisons, but algorithmic complexity doesn't care about the constant factors. – Lightness Races in Orbit May 24 '14 at 23:31
  • @T.C. Oh, right, I see now that you were quizzing Dilip on his proposal, and actually know all this already. So never mind :) Yep with no `<` or `>` you'd be out of luck. – Lightness Races in Orbit May 24 '14 at 23:32
  • @LightnessRacesinOrbit It looks like fundamentally what I am trying to do is *not* ok :-) I am trying to use different containers in set_intersection while supplying a back_inserter for a vector. It looks like all I have to do is swap the order of the first four arguments to set_intersection (map first and vector second) and that'd land me in trouble. – ForeverLearning May 27 '14 at 20:38
  • @Dilip The standard mandates that the values be copied from the first range supplied. But you are obviously correct that if you swap the ranges you'd get a compile error. – T.C. May 28 '14 at 14:06