7

Say I have an std::vector<int> and want to know if it contains a 3 or get the iterator to the 3.

I do not want to use an std::set or std::multiset for whatever reason.

I would like to do this in std::execution::par_unseq mode.
The two options I see are std::any_of and std::find, but they do not quite do it for me.

#include <execution>
#include <functional>
#include <iostream>
#include <vector>

int main()
{
  std::vector vec{ 1, 1, 1, 1, 1, 3, 3, 3, 3 };

  bool contains{ std::any_of(
      std::execution::par_unseq,
      vec.begin(), vec.end(),
      std::bind(std::equal_to{}, std::placeholders::_1, 3)) };

  auto found{ std::find(std::execution::par_unseq, vec.begin(), vec.end(), 3) };

  return 0;
}

The std::any_of should do what I want, but the call is extremely messy for what it does. Ranges and std::bind_front will help, but not a whole lot.

The problem with std::find is that it has to find the first occurrence of a 3, which limits its efficiency, as I do not care which 3 it finds.

  • Is there an alternative to std::any_of that searches by value?
  • Is there std::find (and std::search) that finds any match, not the first?

Answers up to are welcome.


EDIT:

For clarification, I do not want a function that checks for contains and gives me an iterator at the same time. I am looking for two distinct functions (One that returns an iterator, one that returns a bool).

  • Why are you using `std::bind` instead of lambdas? – Max Langhof Nov 15 '19 at 09:38
  • @MaxLanghof aint nothing wrong with `std::bind` if it works. – parktomatomi Nov 15 '19 at 09:40
  • 1
    [`std::any_of`](https://en.cppreference.com/w/cpp/algorithm/all_any_none_of) doesn't give you an iterator, only a boolean indicator if at least one exists or not. – Some programmer dude Nov 15 '19 at 09:41
  • And if it doesn't matter which `3` you find, what's wrong with finding the first one? – Some programmer dude Nov 15 '19 at 09:41
  • 1
    @Someprogrammerdude In theory, if the vector is really long and there is a `3` at the start of some thread's chunk, `find` would have to ensure none of the prior threads find any `3` before it can return (which may take a while), whereas `any_of` could cancel immediately. Whether implementations do that in practice is a different matter. – Max Langhof Nov 15 '19 at 09:44
  • 1
    @parktomatomi `std::bind(std::equal_to{}, std::placeholders::_1, 3))` vs `[](auto elem) { return elem == 3; }` is not even close in my eyes. – Max Langhof Nov 15 '19 at 09:45
  • (Also note that neither `gcc` nor `clang` have `` support yet. MSVC does though, and from a bit of source diving it at least _might_ have the kind of optimization mentioned above...) – Max Langhof Nov 15 '19 at 10:06
  • @MaxLanghof I'm using execution policies in clang 10. It uses intel's tbb under the hood. – nascardriver Nov 15 '19 at 10:11
  • @nascardriver Oh right, the 10.0 preview does have it. I guess I can hide behind "I meant the stable releases" ;) – Max Langhof Nov 15 '19 at 10:13
  • 1
    @MaxLanghof GCC 9 has support for `` and it's been out there for a while. – Pilar Latiesa Nov 15 '19 at 16:24

2 Answers2

2

I don't quite see what you consider bad about any_of, especially if you use a lambda instead of the std::bind chant:

bool contains = std::any_of(std::execution::par_unseq,
                            vec.begin(), vec.end(),
                            [](auto elem) { return elem == 3; });

(Distribute line breaks and whitespace to your liking...)


Alternatively, if this is common, you might also define a functor object yourself that takes the value to compare to in the constructor. Or, more conveniently, something like this:

template<class T>
auto equalTo(T value)
{
  return [value](auto other) { return other == value; };
}

// ...

bool contains = std::any_of(std::execution::par_unseq,
                            vec.begin(), vec.end(),
                            equalTo(3));

As for something like std::find_any that takes advantage of early cancellation when executed in parallel: Such a thing does not exist in the standard library yet. The serial versions are obviously optimal by returning upon finding the first occurrence, but noone went and added a different algorithm to improve the parallel case. It would also be uncommon for calls to the standard library to have timing-dependent results (even if race-free), so I wouldn't get my hopes up.

I'll also note that the current MSVC implementation actually does the fast cancellation optimization discussed above (how much it really helps depends on the chunking policy, which I can't figure out at a glance). For clang I can't really tell anything...

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
  • I don't like that I have to pass a callable at all when I want to find an element by value. I used `std::bind` and `std::equal_to`, because a lambda for a comparison to a constant seems overkill. Now that I see the lambda written out, it's easier to read than what I did. – nascardriver Nov 15 '19 at 10:16
  • Fast cancellation should work just as well for `all_of` (on a `false`). – Davis Herring Nov 15 '19 at 14:30
  • @DavisHerring Yeah, you're right. Fixed that section, thank you! – Max Langhof Nov 15 '19 at 14:39
  • Instead of writing a function that returns a lambda, one could also define an `inline constexpr` variable initialized with a lambda instead. Why use one method instead of the other? Is it just preference or are there more subtleties involved when making this consideration? – 303 Nov 27 '21 at 17:06
  • @303 The function is useful if there is some parameter(s) (`T value` in this case) that "customizes" the lambda. – Max Langhof Nov 29 '21 at 08:42
2

Say I have an std::vector and want to know if it contains a 3 and optionally get the iterator to the 3.

I would pack the std::any_of version to a template-function and use a lambda instead of std::bind.

Note that the std::any_of is possibly implimented by std::find_if

#include <iostream>
#include <vector>
#include <algorithm> // std::find_if
#include <execution> // std::execution
#include <iterator>  // std::distance

template<typename Iterator, typename Predicate>
auto find_any_of(Iterator first, Iterator last, Predicate pred) -> std::pair<bool, Iterator>
{
   const Iterator iter = std::find_if(std::execution::par_unseq, first, last, pred);
   return { iter != last, iter };
}

int main()
{
   std::vector vec{ 1, 1, 1, 1, 1, 3, 3, 3, 3 };
   // you call the function like
   const auto [found, ele_iter]
      = ::find_any_of(vec.cbegin(), vec.cend(), [](const int ele) { return ele == 3; });

   if (found)
      std::cout << "Element found at index: " << std::distance(vec.cbegin(), ele_iter);
   return 0;
}
JeJo
  • 30,635
  • 6
  • 49
  • 88
  • Sure I can wrap everything to make it easier to use. But it doesn't solve my efficiency concern of `std::find_if` having to find the first element whereas I just need any. My original question might have been poorly worded. I don't find the check for contains and get the iterator at the same time. I want both separately (The iterator version would give we a way of checking contains of course). – nascardriver Nov 15 '19 at 10:12
  • @nascardriver Regarding your efficiency concern, you are anyways using `std::execution::par_unseq`. That should get you some.(IMHO) Have you tried to profile/ benchmark (for your use case) the changes with and without parallel `std::execution`? Other than the provided standard algorithm answer, I don't know other options ‍♂️ – JeJo Nov 15 '19 at 10:22
  • 1
    I'm not looking for a general performance boost, I'm looking for an algorithm that isn't restricted in which element it has to find. Imagine a vector with 1 billion entries and the last entry is a 3. If the first thread finds the 3 with its first check, the algorithm still has to go through almost 1 billion elements even though it already found what I was looking for, because `find` has to return the lowest index match. – nascardriver Nov 15 '19 at 10:48