2

Does anyone know of a C++ thread pool implementation that allows both parallel threading (like a typical thread pool) but also allows for back to back serial execution order. I have spent several days trying to make this work by modifying the following thread pool but I cannot seem to make it work. I have looked into the techniques used by intel TBB, and also I looked into possibly using the concepts from microsoft's PPL (its asynchronous agents library looks promising) - both of which have task oriented techniques to achieve the above - Unfortunately however, these solutions will not work my target PowerPC linux embedded target.

EDIT I put together a live coliru demo with source that produces the thread graph - and also shows a good example of a scheduler_loop where theoretically one could wait for threads to complete. The code also shows a UtlThreadPool with 2 threads where I feed it with the concurent tasks - however that 'feeding' is not fully correct and will need a little work to traverse through the nodes.

The data structure that I use to make an execution graph is shown below. It uses a PriorityNode data structure. This structure is essentially a linked list of PriorityNodes, each one contains a vector of PriorityLevel tasks that can run concurrently and a pointer to the next PriorityNode which indicates the threads to be run serially afterwards. Once these have ALL completed, if the mNextNode member is not a nullptr, then this should be scheduled to run in the thread pool (and so forth until the mNextNode is nullptr. Sequencing through this linked list of PriorityNodes is how I would like the thread pool to sequence through its threads. The PriorityNode has insertion operator that typically produces output as follows. (this would mean that 1A1 can be run concurrently with 1A2 and when both of these threads have completed the next PriorityNode would allow 1B1, 1B2, 1B3 and 1B4 to run concurrently - on however many threads the pool has available.

1A1
1A2
+-1B1
+-1B2
+-1B3
+-1B4

The nearest thing I have seem to a solution to this problem - again note it is intel specific and I am on power PC is the intel TBB - here is the example they use for serial execution order.

/**
 * Branch representing fundamental building block of
 * a priority tree containing szPriority entries.<p>
 *
 * Each priority tree struct contains a vector of concurrent
 * priorities that can be scheduled to run in the thread pool -
 * note that the thread pool must have no entries associated
 * with the current channel running before enqueueing these
 * tasks. The application must wait for the thread pool to
 * complete these tasks before queuing up the dependent tasks
 * described in the mNextNode smart pointer. If mNextNode is
 * unassigned (nullptr), then we have reached the end of the
 * tree.
 */
struct PriorityNode {
    explicit PriorityNode(
        const std::vector<PriorityLevel>& rConcurrent,
        const std::shared_ptr<PriorityNode>& rNext = std::shared_ptr<PriorityNode>(),
        const size_t& rDepth = 0)
        : mConcurrent(rConcurrent)
        , mNextNode(rNext)
        , mDepth(rDepth)
    {}

    /**
    * Stream insert operator<p>
    *
    * @param os     [in,out] output stream
    * @param rhs    [in] PriorityLevel to send to the output
    *               stream.
    *
    * @return a reference to the updated stream
    */
    inline friend std::ostream& operator << (
        std::ostream& os, const PriorityNode& rhs) {
        // indent 2 spaces per depth level
        std::string indent = rhs.mDepth > 0 ?
            (std::string("+") +
            std::string((rhs.mDepth * 2) - 1, '-')) :
            std::string();
        // print out the concurrent threads that 
        // can be scheduled with the thread pool
        for (const auto& next : rhs.mConcurrent) {
            os << indent << next << std::endl;
        }
        // print the dependent priorities that can only
        // be scheduled when the concurrent ones are finished
        if (rhs.mNextNode) {
            os << *rhs.mNextNode << std::endl;
        }
        return os;
    }
    // these are all equivalent thread priorities
    // that can be run simultaneously
    std::vector<PriorityLevel> mConcurrent;

    // these are concurrent threads that must be AFTER all 
    // mConcurrent tasks have completed (exiting the thread pool)
    std::shared_ptr<PriorityNode> mNextNode;

    // recursion depth
    size_t mDepth;
};
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
johnco3
  • 2,401
  • 4
  • 35
  • 67
  • Although most thread pools are effective a task queue, I don't think there's any inherent ordering guarantees to when the tasks run. A thread pool is generally something that manages "independent work items in the background and in parallel". So, if you have order and dependency requirements, a thread pool is not likely what you want. I'd suggest a work queue of some sort or (possibly better) an actor framework. – Peter Ritchie Sep 26 '14 at 16:54
  • @PeterRitchie I edited the question to show a live demo. The scheduler loop there shows a potential placeholder where one could 'theoretically' check the PriorityNodes to see if all PriorityNodes.mConcorrent have completed. Unfortunately I don't know how to do that. I've been looking into std::future and std::shared futures to poll for thread complete status - but its a bit too advanced for my comfort level. – johnco3 Sep 26 '14 at 17:33
  • 1
    You can nest futures and have something (including a future) wait for other futures with shared_future. But, if you want more independent tasks, i'm not sure futures is the way to go. – Peter Ritchie Sep 26 '14 at 17:53
  • @PeterRitchie I was thinking that something along the lines nesting futures would be workable, I've been looking for sample code that might accomplish something like this as I am pretty new to the futures and expecially the shared futures. – johnco3 Sep 26 '14 at 17:55
  • I'm not sure the performance implications of nesting futures. For example, if you nest one future within another the first future may already be executing on a thread, to spawn another future (and possibly another thread) may introduce context switches that may hinder performance (or not be the most performant code). For example: if you're already in a future, and need to perform more work, just do it in the same future. That could simply be another lambda you invoke. – Peter Ritchie Sep 26 '14 at 17:58
  • @PeterRitchie Performance is not a huge concern here - the back to back nature of certain threads is far more important. The threads are actually FTP transfers that each typically take a few minutes. The thing is that there is a configuration file that defines the pool size and I have to stick to that size and try to distribute the tasks evenly across these as fairly as possible while respecting the execution order. I wish there was a commercial cross platform thread pool that would do this for me. :) Shame boost hasn't quite got that yet. – johnco3 Sep 26 '14 at 18:14
  • I don't really think any thread pool would do what you want. nested futures seem to be the closest thing, other than a task framework that handles dependant tasks. (e.g. a actor framework). But, seems too heavyweight for what you've described. – Peter Ritchie Sep 26 '14 at 18:16

2 Answers2

2

Why not just use TBB on PowerPC? It is a highly portable library designed to be as cross-platform as practical; and I've heard it is [being] ported on BlueGen by TBB open-source community. You can ask them on the Intel TBB forum, for example, by reviving this forum thread.

Intel does not distribute PowerPC binaries for TBB but you can try build it from sources simply by

make tbb

See also these community patches.

Anton
  • 6,349
  • 1
  • 25
  • 53
  • Thanks for the tip, I incorrectly assumed that intel TBB would not be making code that would work on competitive architectures. Now that this is an option, I will look into it. – johnco3 Sep 26 '14 at 20:34
  • 2
    I was personally involved in the PowerPC porting effort to enable TBB to run on IBM Blue Gene systems and it has worked for some time. I don't remember exactly what release first supported it because we had a fork for a while, but the TBB 4.2 release is running on Blue Gene/Q today. – Jeff Hammond Dec 23 '14 at 01:34
0

If anyone is still looking for this, check out the repo here - https://github.com/hirak99/ordered_thread_pool

Basically with this you can replace code like this -

while (...) {
  std::cout << CostlyFn(input)) << std::endl;
}

Into this -

OrderedThredPool<std::string> pool{10, 1};
while (...) {
  pool.Do(
    [&input] { return CostlyFn(input); },
    [](const std::string& out) {
      std::cout << out << std::endl;
    });
}

And it will execute in order.

KalEl
  • 8,978
  • 13
  • 47
  • 56