2

In structured parallel programming, algorithms are often defined recursively in a divide-and-conquer fashion.

The proposal P2300 that is currently targeted for C++26 aims to provide a modern solid foundation for asynchronous and parallel programming with C++, based around the concepts senders, schedulers and sender adaptors.

The documentation of P2300, libunifex as well as the examples are mostly concerned with the asynchronous and less about the parallel use case.

I managed to realize a recursive parallel quicksort based on libunifex:

any_sender_of<> quicksort(scheduler auto sch, std::random_access_iterator auto begin, std::random_access_iterator auto end)
{
    size_t N = std::ranges::distance(begin, end);

    constexpr size_t parallelCutOff = 128;
    if ( N <= parallelCutOff ) {
        std::cout << "Fullfilling batch from thread " << std::this_thread::get_id() << "." << std::endl;
        std::sort(begin, end);
        return just();
    }

    auto pivot = begin; // don't do this in production code!
    pivot = std::partition(begin, end, [=](auto &&e) { return e < *pivot; });
    auto pipe_begin = schedule(sch);
    return when_all(
     let_value(pipe_begin, [=]() { return quicksort(sch, begin, pivot); }),
     let_value(pipe_begin, [=]() { return quicksort(sch, pivot + 1, end); })
    ) | then([](auto&&...){});
}


int main()
{
    std::vector<int> v = //...;
    scheduler auto sch = thread_pool.get_scheduler();
    sync_wait(quicksort(sch, v.begin(), v.end()));

    std::ranges::copy(v, std::ostream_iterator<size_t>(std::cout, ", "));
}

This implementation however

  • needs any_sender_of<> as return type which is not part of P2300 and afaiu does type-erasure and therfore requires heap-allocation(?), which kills the performance
  • takes a scheduler as argument not a sender and therefore does not allow for composition with other senders

What is the envisioned, ideomatic way to implement recursive algorithms using P2300?

akreuzkamp
  • 21
  • 2
  • Wouldn't a coroutine be more maintainable and explicit? – Richard Hodges Sep 15 '22 at 08:31
  • Well, I am trying to understand how the sender/receiver abstraction works, so coroutines won't help me there. :) Afaik, coroutines can either be implemented eagerly or lazily. With eager execution, you can get rid of the when_all and stuff, but at the cost of needing memory allocation, (more) synchronization, etc. Or you have lazy coroutines (as libunifex, the reference implementation of P2300, implements them), then you get much better performance, but you do need the when_all (which means in the end the coroutine saves you nothing but the sync_wait call). – akreuzkamp Oct 05 '22 at 14:35
  • Porting senders/receivers to the parallel algorithms is an open discussion currently. The `scheduler` argument particulary is a highly debated issue (whether it's going to be added or not). Here is an HPX sample implementation of `rotate` (since you tagged it, though not recursive) https://github.com/STEllAR-GROUP/hpx/blob/e877f5d77ac8939a93084af6f70251c172247b3c/libs/core/algorithms/tests/unit/algorithms/rotate_sender.cpp#L98-L99 and here is a pika (an HPX fork) `for_each` https://github.com/pika-org/std-async-algorithms/blob/main/include/stdalgos/detail/for_each.hpp – gonidelis Jan 31 '23 at 01:06
  • The existance of a `scheduler` argument though is orthogonal to the existance of a `sender` piped input argument. Not sure about the consensus of the s/r with recursive algorithms but here is a recursive code example from the latest paper revision: https://github.com/kirkshoop/libunifex/blob/filecopy/examples/file_copy.cpp – gonidelis Jan 31 '23 at 01:10
  • @akreuzkamp This code wouldn't compile for me with the recent libunifex. Any_sender_of would fail to bind to a receiver. – gonidelis Apr 23 '23 at 15:33

0 Answers0