3

Consider:

std::vector<double> u, v;

#pragma omp parallel for
for (std::size_t i = 0u; i < u.size(); ++i)
  u[i] += v[i];

To express similar code with the C++17 parallel algorithms, the solution I found so far is to use the two input ranges version of std::transform:

std::transform(std::execution::par_unseq,
               std::begin(u), std::end(u), std::begin(v), std::begin(u),
               std::plus())

which I don't like at all because it bypasses the += operator of my types and in my real use case leads to much more verbose (4 times longer) code than the original OpenMP code (I cannot just use std::plus because I have to first make an operation on the RHS range elements).

Is there another algorithm that I oversight?

Note also that if I use ranges::zip the code won't run in parallel in GCC 9 because if iterator_category is not at least forward_iterator the PSTL back-end falls back to the sequential algorithm: https://godbolt.org/z/XGtPwc.

metalfox
  • 6,301
  • 1
  • 21
  • 43
  • @George Yes. In my real code I need to use a lambda. That's part of the extra verbosity I don't like. – metalfox May 14 '19 at 10:40
  • @George Compare `for (std::size_t i = 0u; i < u.size(); ++i) u[i] += 2 * v[i];` with `std::transform(std::execution::par_unseq, std::begin(u), std::end(u), std::begin(v), std::begin(u), [](auto a, auto b) { return a + 2 * b })` – metalfox May 14 '19 at 10:46
  • 1
    @metalfox: You are unlikely to find another algorithm which is concise than the one you have now. Any algorithm will require at least the four iterators, the scheduling policy and a function object as arguments. – P.W May 14 '19 at 11:01
  • Using functions from the algorithm header often results in more verbose code. The benefit is in the correctness of the implementation. – Khouri Giordano May 14 '19 at 11:02
  • @George They are general so you can use their correctness in any situation. – Khouri Giordano May 14 '19 at 11:10
  • @P.W I see. Does that also mean that it is not possible to use the `+=` operator? – metalfox May 14 '19 at 11:19
  • 1
    AFAIK nothing in `` uses compound assignments, it's only ever copy (or move) assignments – Caleth May 14 '19 at 12:08
  • 1
    Also AFAIK `std::for_each` is the only thing in `` that allows modification through it's input range. – Caleth May 14 '19 at 12:23
  • @George it's UB to modify the parameters passed to the functor in `std::transform` – Caleth May 14 '19 at 12:25
  • @Caleth Yes. I considered the `for_each` option. The problem is that I would need either proxy or stashing iterators, which don’t work in the GCC parallel algorithms (unless I specialize their `iterator_traits` myself). That’s unfortunate because there are non-trivial types that would benefit from using `+=` instead of the chained sum and then assignment. – metalfox May 15 '19 at 07:41

1 Answers1

0

Have you tried tbb::zip_iterator (https://www.threadingbuildingblocks.org/docs/help/reference/iterators/zip_iterator.html)? Its iterator_category is random_access_iterator.

So the code will look like

auto zip_begin = tbb::make_zip_iterator(std::begin(u), std::begin(v));
std::for_each(par_unseq, zip_begin, zip_begin + u.size(),
                  [](auto &&x) { std::get<0u>(x) += std::get<1u>(x); });
capatober
  • 351
  • 2
  • 5