1

I've never really done much with multithreading until recently (Vulkan), so I'm not well versed in it.

I have a small test program here: https://github.com/seishuku/ThreadPool_TEST

It should just run the 4 jobs in parallel, then when all are done, print the results.

The problem is, sometimes they don't all set the "Done" flag, and even worse, sometimes it doesn't even hit the assert (continues into the main thread and it times out).

With my limited multithread knowledge, I can't for the life of me figure out why. I'm sure it's probably something really bone-headed.

I've tried atomics for the flag, volatile keyword (thinking maybe the compiler was optimizing out something), mutexes, using semaphores... All get the same result.

Edit: Forgot, this is primarily on x64 windows with VS2022, but I get similar results on Linux and GCC.

Seishuku
  • 83
  • 5
  • 1
    The code should be in the question itself. See [mcve]. – user3386109 Mar 08 '23 at 02:48
  • I can't read the code since the link is broken, but I'm going to take a wild guess and say your main thread never calls `pthread_join()` on the child threads before printing the results and exiting. – Jeremy Friesner Mar 08 '23 at 03:25
  • Ugh... Sorry, I forgot that VS defaults to private when creating a new repo. It's public now. – Seishuku Mar 08 '23 at 03:46
  • So after thinking about this while at work, it occurs to me that the assert in the thread function is failing because as it sets ThreadData[].Done=1, the while loop exits and immediately sets ThreadData[].Done=0... Just a classic race condition. If I toss in a 1ms sleep between the while and ThreadData[].Done=0, it *appears* to be fixed, but I'm dubious and is there a less dumb way to do that than a sleep? – Seishuku Mar 08 '23 at 22:36

1 Answers1

0

Ok, so the answer to my problem appears to be that it's a race condition. Main thread resets the done flags before the thread's assert evaluates, but then some out-of-order execution also happens and the main thread can be back at the while loop waiting? I'm not 100% sure I understand what exactly is going on there, but spinning the CPU for a bit between the while loop and resetting the flags fixes things in the test case above.

Now, I'm fairly sure I've tried it before (though I may have used it wrong), but I think a pthread barrier is the real solution here. Initialize the barrier with a count of 5 (4 threads + main thread), then instead of the while loop and done flags everyone just calls pthread_barrier_wait?

Seishuku
  • 83
  • 5