A pooled async object for better performance

Question

I read a bit about std::async and by I am sadly a bit disappointed from what I've learned and tried so far.

Now I "implemented" a litte std::async replacer with a thread pool behind the scenes to try around with the game I am just working on in my freetime. The async replacer is neither beautiful, nor elegant or optimized, but still it's outperforming the std::async if you use it with the std::launch::async flag.

The code is quite minimal:

namespace util
{
    template <class R, class T> std::future<R> async(const T& task)
    {
        std::shared_ptr<std::promise<R> > result = std::make_shared<std::promise<R> >();

        //threadctrl::run is a call which runs tasks on an engine-global thread pool, which get's injected on initialization
        threadctrl::run(std::bind([](std::shared_ptr<std::promise<R> > result, const T& task)->void
        {
            try
            {
                result->set_value(task());
            }
            catch (...)
            {
                //not handled here
            }

        }, result, std::ref(task)));

        return result->get_future();
    }
}

A simple usage example would look like this:

int main()
{
    std::future<int> result = util::async<int>(std::bind(give5));

    //Do some other work

    std::cout << result.get();

    return 0;
}

My question is: Is there a really good reason to prefer the original std::async implementation over this approach in a gaming environment or is this just fine?

The performance of my minimal implementation is a lot better and I only use/need this very slim interface which would come very handy to me since I won't use deferred asyncs anyway.

Under windows 7 32 bit with an i5 4570 and 4gb of ram i get the following results:

0.001 milliseconds per util::async task (scheduling only)

0.006 milliseconds per std::async task (scheduling only)

If you're based on windows, are you aware of the concurrency runtime and the parallel patterns library? They provide a superset of the features of the C++ standard library in this area. — Rook, Jun 09 '17 at 16:55
Also, microbenchmark figures like that are likely to generate a lot of skepticism, especially if you haven't supplied the code you used to generate them. — Rook, Jun 09 '17 at 16:55
Well, I benched the schedule time during gameplay with differen mini games. Thank you, i'll lookup the library, never heard of it... :) — Mango, Jun 09 '17 at 16:58
But if you are curious, I tried the async to calculate physic events in a space invaders like game and a side scroll shooter I am coding for instance (not because I think I will gain much of a boost from parallelism, but for playing around with it). the basic attempt is described here: https://mango2go.wordpress.com/2017/05/06/a-simple-collision-approach/ with the simple difference that each spawned collision event launches an async task to calculate physic. I added a counter and divided the total time taken by the async launches I collected during some gameplay through the counter. — Mango, Jun 09 '17 at 19:06

score 1 · Accepted Answer · answered Jun 09 '17 at 21:32

1

std::async is banned from using a tread pool; it must treat the launched tasks "as if" they where run in separate threads. So thread local storage is cleared, 100 of them running or blocked shouldn't block a new one if they don't contend, etc.

Your pool doesn't have to do these things, so can be more efficient.

Note that on windows, std::async actually violates these assumptions. I have read of plans to fix this in 2017, but GM did not have the fix.

Your code you posted is bad, in that it has a dangling reference, so you should probably not use your own hand-written code for this purpose. Writing threading code is hard. You'll get it wrong a lot; if you had such a big error in a short snippet, think how many errors must be in your thread pool implementation?

What I use is something like this thread_pool, which doesn't require a global thread pool. When I have a task that could use lots of parallelism, I give it a thread pool, and run tasks on it.

It does mean that if two different tasks both requiring lots of parallelism are running at once, they contend. But I like not having a singleton around. You can expose a singleton thread_pool instead.

Be aware that tasks in a thread_pool need to avoid blocking. To this end, you might want to provide the ability for your thread_pool tasks to get a continuation, or the ability to access a "high blocking expected" pool with excess threads for such operations.

answered Jun 09 '17 at 21:32

Yakk - Adam Nevraumont

262,606
27
330
524

Ok, first of all, thanks I weren't aware of that rule for async that prevents it from using thread pools, but I am aware that I should avoid blocking tasks. The global thread pool is engine global, which means it's stored in the engine and for the sake of simplicity it can be accessed from anyware. Creating a seperate pool for the asyncs is no option in my opinion since I want the threads to be handled at one specific point in the code. I wam interested what in particular is bad about my code since I would like to improve it (ok, the dangling reference, but is there more?) – Mango Jun 09 '17 at 21:42
@mango that would be code review. But your code needlessly allocates a promise on the heap, probably because your runner expects copyable objects. Bind is a bad idea, use lambdas or function objects that kniw they are run-once. – Yakk - Adam Nevraumont Jun 09 '17 at 22:12
https://en.cppreference.com/w/cpp/thread/async: "The template function async runs the function f asynchronously (potentially in a separate thread which may be part of a thread pool)..." Not banned to run on thread pool according to this. – 0kcats Nov 20 '18 at 23:42
@0kcats that is dewcriptive not normative. – Yakk - Adam Nevraumont Nov 21 '18 at 01:00
@Yakk-AdamNevraumont Thanks, I see that clearer now. – 0kcats Nov 22 '18 at 06:39

A pooled async object for better performance

1 Answers1