Multithreading with std::future in C++: Accessing shared data

Question

I am currently developing a multi-threaded application in C++ where different threads are expected to process data from a shared data structure. I'm aware that the standard library provides std::future and std::async to easily handle asynchronous operations, and I'm trying to use these in my application.

Here's a simplified sketch of my code:

#include <vector>
#include <future>

std::vector<int> shared_data;

// Some function to be executed asynchronously
void process_data(size_t start, size_t end) {
    for (size_t i = start; i < end; ++i) {
        // Do something with shared_data[i]
    }
}

int main() {
    std::future<void> fut1 = std::async(std::launch::async, process_data, 0, 10);
    std::future<void> fut2 = std::async(std::launch::async, process_data, 10, 20);

    // Other operations...

    return 0;
}

I have the following questions regarding this code:

Since shared_data is being accessed by multiple threads, do I need to protect it with a std::mutex or other synchronization primitives? Is there a way to pass std::future objects to other functions or store them in a data structure, and what would be the potential implications of doing so? How can I handle exceptions thrown by the process_data function and propagated through the std::future objects? Any guidance or best practices related to the usage of std::future in multithreaded scenarios would be greatly appreciated.

In order to make the shared data access thread-safe, I attempted to introduce an std::mutex and lock it using std::lock_guard in the process_data function like so:

std::mutex mtx;

void process_data(size_t start, size_t end) {
    std::lock_guard<std::mutex> lock(mtx);
    for (size_t i = start; i < end; ++i) {
        // Do something with shared_data[i]
    }
}

I also attempted to store std::future objects in a std::vector for later use, and tried to handle exceptions using a try/catch block around the get() function of std::future.

I was expecting that locking the std::mutex would ensure that only one thread can access the shared data at a time, preventing race conditions. I also expected that I would be able to easily store the std::future objects in a vector and handle exceptions from the asynchronous tasks.

However, I'm unsure if these methods are the most efficient or even correct, given the lack of detailed examples or guidelines on these topics in the documentation and tutorials I've found. I'm particularly interested in understanding the correct way to use std::future and std::async in more complex scenarios, and how to handle exceptions properly in this context.

related/dupe: https://stackoverflow.com/questions/41068201/if-i-make-a-piece-of-code-in-which-each-thread-modifies-completely-different-par — NathanOliver, Jun 02 '23 at 17:22
It highly depends what "Do something" means. Is it read access or write access? Even more important: are you modifying the vector like `push_back()`? Can you declare `shared_data` as `const`? — Thomas Weller, Jun 02 '23 at 17:22
@ThomasWeller The "Do something" involves both reading and writing to the elements of shared_data. However, I am not performing any operations that would modify the structure of the vector itself (like push_back()). So, the size of the vector remains constant, but its contents are subject to change. Due to this, I can't declare shared_data as const. I hope this helps, and I appreciate any additional suggestions you might have. — Ravement, Jun 02 '23 at 17:28
@NathanOliver Its not exactly a dupe. Its related for sure, but not a dupe. — Ravement, Jun 02 '23 at 17:31
you're basically serialize the **whole** `process_data`, it'd works as single thread (except the second call may execute first and "Other operations" may happens parallelly) — apple apple, Jun 02 '23 at 17:36
@appleapple If the entire `process_data` function is protected by a single lock, it does **serialize** the function and reduces the benefits of multithreading. I understand that this might not be the most efficient way to use threads, especially if the goal is concurrent execution. I will need to explore more refined synchronization strategies or possibly restructure my data to better support concurrent access and modifications. Thanks for your input! — Ravement, Jun 02 '23 at 17:41
@Ravement fwiw, if the `shared_data` is global as in your question, using `shared_ptr` as what @‌PepijnKramer said solve nothing. — apple apple, Jun 02 '23 at 17:43
@appleapple Even with a std::shared_ptr, concurrent access to a global shared_data object still requires synchronization. A std::shared_ptr primarily manages object lifetime, not thread safety. So, while it can be useful in certain situations, additional measures would be needed to handle concurrent modifications. Thanks for the important clarifying — Ravement, Jun 02 '23 at 17:47

score 0 · Accepted Answer · answered Jun 02 '23 at 17:22

If the data is readonly (and its not too much, just copy it). Otherwise make a shared_ptr to your data (and using a lambda expression) you can capture the shared_ptr by value (! not reference!!!) This will extend the lifetime of the data to the lifetime of the thread that uses it longest. So something like this : std::shared_ptr<SharedData> data; auto future = std::async(std::launch::async( [data]{ process_data(data); };

If the data is read/write then add a mutex or some other synchronization mechanism to your data class and use getters/setters with lock to update/read the values in the data.

Ofcourse I meant `std::shared_ptr data = std::make_shared();` — Pepijn Kramer, Jun 02 '23 at 17:41

Multithreading with std::future in C++: Accessing shared data

1 Answers1