1

I recently created a pattern searching program in Conway's Game of Life, but It ran too slow to be practical.
So I decided to parallelize it, but I failed; it caused segmentation fault, which is very likely due to data race.
A brief explanation of the code:

/* ... */
#include <list>
#include <mutex>
#include <future>
#include <iostream>
#include <forward_list>
int main() {
    /* ... */
    while (true) {
        /* ... */
        std::forward_list</*pattern type*/> results;
        std::list<std::future<void>> results_future;
        std::mutex results_mutex;
        for (/*All the possible unique pattern in given grid*/)
            results_future.push_back(std::async([&]{
                /*A very hard calculation*/
                if (/*The pattern is what I'm looking for*/) {
                    std::lock_guard<std::mutex> results_guard(results_mutex);
                    results.push_front(std::move(/*The pattern*/));
                }
            }));
        while (!results_future.empty()) {
            results_future.front().wait();
            results_future.pop_front();
        }
        if (!results.empty()) {
            for (auto &res : results)
                std::cout << "Pattern found:" << std::endl << res;
            return 0;
        }
    }
}

I'm pretty sure results is the only object that is declared out of the lambda-expression's function scope and is being modified there, so I locked it with mutex.
But the data race is still present. So what is causing it?

Dannyu NDos
  • 2,458
  • 13
  • 32
  • Where does the seg fault occur when you run it under a debugger? What is the call stack? – G.M. Feb 03 '17 at 12:04
  • `I'm pretty sure results is the only object that is declared out of the lambda-expression's function scope and is being modified there` in that case do not capture everything, capture just `results` and see what happens – SingerOfTheFall Feb 03 '17 at 12:11
  • Occasionally, segmentation fault occurs while `search_grid`, which is declared out of the lambda scope, and is being modified in `/*All the ...*/` in the code, is converted to pattern type in lambda scope. – Dannyu NDos Feb 03 '17 at 12:32
  • Sidenote: this conversion involves move construction of `std::unordered_map`. – Dannyu NDos Feb 03 '17 at 12:34
  • One thing I've learned over the years: when you're "pretty sure" about something, you're probably wrong. – molbdnilo Feb 03 '17 at 12:45
  • That's the moment, when you've showed us code which looks ok hiding parts which in your opinion are not related to problem, and that probably means the problem is in these parts ;). – zoska Feb 03 '17 at 13:03
  • I believe you do need to protect the read access to `results` in the main thread with `results_mutex`, even though you do wait for all futures to complete. The futures completing doesn't guarantee their side-effects being visible to other threads without explicit synchronization. That being said, I don't think this issue is causing your crashes. The problem is likely somewhere else. – Joseph Artsimovich Feb 03 '17 at 13:37

1 Answers1

0

I found that the problem is related to the lambda capture:

for (/*All the possible unique pattern in given grid*/)
    results_future.push_back(std::async([&]{
        /*pattern type*/ patt_orig = search_grid;
        /* ... */
    }));

search_grid, as stated in SingerOfTheFall's comment above, is captured by reference. And it is converted to pattern type within lambda scope. The problem is that search_grid can be modified while it is being converted to pattern type, or vice versa. Data race!

The conversion must be within the lambda capture:

for (/*All the possible unique pattern in given grid*/)
    results_future.push_back(std::async([&, patt_orig = (/*pattern type*/)search_grid]{
        /* ... */
    }));

And everything's okay now.

Dannyu NDos
  • 2,458
  • 13
  • 32