Given a container of futures, how to perform all gets in a non-blocking way?

Question

So, I'm trying to create a generic way to both create a container of futures, as well as perform all the future.get()' in a non-blocking way.

I anticipate that the completion time the tasks will take should range from a few hundred milliseconds, up to 2 minutes typically. Some however, may not complete at all. There will be at least 10,000 tasks to perform in a typical run.

I want the quickest-returning task results back without being held up by other, more long-running tasks in the futures container.

Here's what I have so far just using dummy sleep times to simulate the task completion delays (design thanks in large part to good posts made here, such as this, and this):

#include <future>
#include <vector>
#include <iostream>
#include <random>
#include <chrono>
#include <ratio>
#include <thread>
#include <algorithm>

size_t rand_from_range(const size_t, const size_t);
int rand_sleep_range(const size_t, const size_t);
template<class CT> size_t get_async_all( CT& );

// Given a function and a collection,
//  return a vector of futures.
template<class Function, class CT>
auto async_all( Function f, CT coll )
    -> std::vector<decltype(std::async(f, *std::begin(coll)))>
{
  std::vector<decltype(std::async(f, *std::begin(coll)))> futures;
  futures.reserve(coll.size());
  for (auto& element : coll)
    futures.push_back(std::async(f, element));
  return futures;
}

// Given the beginning and end of a number
//  range, return a random number therein.
size_t rand_from_range( const size_t range_begin, 
                        const size_t range_end )
{
  std::uniform_int_distribution<size_t> 
    distr(range_begin, range_end);
  std::random_device dev;
  return distr(dev);
} 

// Given a shortest and longest duration, put the calling
//  thread to sleep for a random duration therein. 
// (in milliseconds)
int rand_sleep_range( const size_t shortest_time, 
                      const size_t longest_time )
{
  std::chrono::milliseconds 
    sleep_time(rand_from_range(shortest_time, longest_time));
  std::this_thread::sleep_for(sleep_time);
  return (int)sleep_time.count();
} 

// Given a container of futures, perform all
//  get()'s.
template<class CT>
size_t get_async_all( CT& async_coll )
{
  size_t get_ctr(0);
  const size_t future_cnt = async_coll.size();
  std::vector<size_t> completed;
  completed.reserve(future_cnt);

  while (true) {
    for (size_t ndx = 0; ndx < future_cnt; ++ndx) {
      // Check to see if this ndx' future has completed already.
      if (std::none_of(std::begin(completed), std::end(completed), 
            [=](size_t x) {
              return (x == ndx);
            }))
      { // No, this one hasn't completed 
        //  yet, attempt to process it.
        auto& f = async_coll[ndx];
        if (f.wait_for(std::chrono::milliseconds(10)) 
              == std::future_status::ready) 
        {
          f.get(); // The future's work gets done here.
          ++get_ctr;
          completed.push_back(ndx);
          if (completed.size() == future_cnt) 
            break; // for()
        }
      }
    }
    if (completed.size() == future_cnt) 
      break; // while()
  }
  return get_ctr;
}

int main()
{
  // A dummy container of ints.
  std::vector<int> my_vec(100);
  for (auto& elem : my_vec)
    elem = rand_from_range(1, 100);

  // A dummy function lambda.
  auto my_func = [](int x) { 
    int x_ = x;
    int sleep_time = rand_sleep_range(100, 20000); // in ms.
    x *= 2;
    std::cout << " after sleeping " << sleep_time << "ms \t"
              << "f(" << x_ << ") = " << x << std::endl;
  };

  // Create and execute the container of futures.
  auto async_coll = async_all(my_func, my_vec);
  size_t count = get_async_all(async_coll);

  std::cout << std::endl << count << " items completed. \n";
}

So, my questions are:

Are there any gotchas to the approach I'm using?
Is there a better/more elegant approach for the get_async_all() than the one I'm using? Or anything else I'm doing, for that matter.

Thanks to anyone for taking the time to look over the code, and to give me any constructive criticism or feedback.

Well, your code is not much of non-blocking, since it blocks until the last future completes. But I don't think you can do better without something like `then()` (which is not in C++11). — svick, Feb 02 '13 at 15:57
@svick Oh, I think I see your point. Hmm. I guess the most important thing is that when a task completes, it can be returned to the caller immediately, while the remainder complete in due course. If I created a deque that I passed by ref into get_async_all() as well, and push_back() on it with a completed future from the interior if() statement, would that approach work? Maybe I could have an output threadpool accept it? I plan to implement this algorithm as part of a library, if that makes any difference. Thanks for the input! — Herpin the Derps, Feb 02 '13 at 16:06
The correct test if a task has *completed* is `.wait_for(std::chrono::seconds(0))`. — Xeo, Feb 12 '13 at 15:24

score 3 · Answer 1 · answered Feb 12 '13 at 15:19

There's at least one gotcha. You invoke std::async without specifying a launch policy, and that means that some or all of the tasks may run deferred. But in your test to see if a task has completed, you test only for std::future_status_ready. If a task is deferred, you'll always get back std::future_status_deferred, which means that your test will never return true.

The simplest solution to this problem is to specify a launch policy of std::launch::async, but then you run the risk of oversubscribing your system. An alternative is to modify your test to check for deferred tasks, but then the question is what to do with them. If you call get or wait on them, you block for arbitrary amounts of time.

Regarding your general approach, rather than blocking for 10ms to wait for each task to finish as you poll them, you might consider waiting for 0ms, i.e., do a pure poll to see if the task is finished. This may reduce the latency between when a task finishes and when you process it, but it may increase the polling overheard to the point where your overall system runs slower.

A completely different approach might be to abandon polling each task and instead have each task write an "I'm done" flag to a shared data structure (e.g., a std::deque), then poll that data structure periodically to see if there is anything in it. If so, process the completed task(s), remove them from the data structure, then go back to sleep until it's time to poll again. If your tasks do a push_back on the data structure, you can naturally process them in the order in which they complete. A drawback of this design would be that the shared data structure could become a performance bottleneck.

Given a container of futures, how to perform all gets in a non-blocking way?

1 Answers1