Is there any way to speedup the future.get() function after launching the std::async function in a multithreaded environment?

Question

I have a correctly working code to process 1000 image files using openCV. As the operation is independent of individual files, I have used multithreading using the std::async function.

I am launching the threads by following function call inside for loop.

std::vector<std::future<cv::Mat>> processingThread;
for (int i = 0; i < jsonObjects.size(); i++) {
        processingThread.emplace_back(std::async(std::launch::async, (cv::Mat(CameraToBEV::*)(json&, std::vector<cv::Point2f>, cv::Point2f, cv::Point2f)) & CameraToBEV::process, &cbevVec[i], std::ref(jsonObjects[i]), roiBox1, opDim,ipDim));
    }

Above code is working fine and taking about 100 milliseconds. But to collect the results I am using another for loop as follows;

std::vector<cv::Mat> opMatArr;
for (auto& future : processingThread) {
        opMatArr.emplace_back(future.get());
       }

This is also working fine but it is taking 9 seconds to execute, which kind of defeating the purpose of using multithreading as I am sequentially populating the vector of cv::Mat objects. Is there any way like, parallelly, as in, in few milliseconds I should be able to get all the cv::Mat objects in the std::vector opMatArr?

I think you have a misunderstanding of `std::async`. Based on how many threads are launched/busy it is possible/likely that the tasks start immediately upon construction. All `get` does is wait for the `future` to complete, you are not forcing the threads to run synchronously — Cory Kramer, Feb 02 '21 at 12:51
Whatever takes up these 9 seconds is most likely taking place within the other threads. Your for-loop just synchronzies them back to your main thread after they're done. If they take that much time your main thread of course has to wait until all the other threads are done. — Detonar, Feb 02 '21 at 12:54
Are you really starting 1000 threads in your first `for` loop? And then waiting for them in the second `for` loop? Are all those images on the same disk drive? — Andrew Henle, Feb 02 '21 at 12:54
Or you have a misconception about multithreading. Managing threads has some overhead and the thread itself still needs to do some work. Because of this overhead, creating too many threads can also decrease performance, unless your system has 1000 cores to run this many threads in parallel. Then there's the speed of IO. Images need to loaded from disk, that's another bottleneck. — Lukas-T, Feb 02 '21 at 12:55
If that's the case i'm nut surprised it takes this long. The scheduling overhead might get pretty high like this. Maybe try to use less threads and process multiple images sequentially within one thread instead — Detonar, Feb 02 '21 at 12:55
The function pointer cast looks very suspicious. Unless it's for selecting an overload, it's either pointless or leads to undefined behaviour. — molbdnilo, Feb 02 '21 at 13:06
@Andrew I updated my answer to include the 1000 threads vs thread pool issue. If you mind this, you should have posted this as an answer ;). — rubenvb, Feb 02 '21 at 16:41

rubenvb · Answer 1 · 2021-02-02T16:40:22.180

Several things come to mind:

You say this is "defeating the purpose of using multithreading". What is the runtime of this code if you run it sequentially (i.e. remove the multithreading code and process each image in a loop)? I would bet it's a lot more than 9 seconds.
The std::async calls only create the task/thread/whatever, but don't start it, nor do you have any guarantee it will finish after a certain time. When you call get(), you force your program to wait on it, and a decent C++ library will yield execution to the thread you're waiting on. This is not a strict guarantee, but any behaviour otherwise would make this kind of code useless from the start. Measuring the startup of the threads is useful but doesn't do much in the sense of measuring how long the actual operation takes (unless the overhead of the threads is larger than the runtime of the combined operations).
You should std::move the result from the future. It seems like you may be creating a copy of the data, which may be impacting performance for no good reason.
Creating this amount of threads will irrevocably lead to contention of resources, be it disk or memory bandwidth, CPU, or even memory size. In general it is better to set up a thread pool of sorts, in which N threads (where N is the amount of cores or "multithreaded cores" available on the system), and some sort of task queue that handles starting new jobs when threads free up. Note that std::async may indeed already do this under the hood (as far as I remember an implementation is free to implement this with pooling or not, also dependent on other options passed to std::async).

Is there any way to speedup the future.get() function after launching the std::async function in a multithreaded environment?

1 Answers1