Synchronizing child threads to atomic time managed by parent

Question

I am trying to write a simulation where different threads need to perform a given calculation on a thread-specific interval (in the minimal example here that interval is between 1 and 4) based on an atomic simulation time managed by a parent thread.

The idea is to have the parent advance the simulation by a single time step (in this case always 1 for simplicity) and then have all the threads independently check if they need to do a calculation and once they have checked decrement an atomic counter and wait until the next step. I expect that after running this code the number of calculations for each thread would be exactly the length of the simulation (i.e. 10000 steps) divided by the thread-specific interval (so for thread interval of 4 the thread should do exactly 2500 calculations.

#include <thread>
#include <iostream>
#include <atomic>

std::atomic<int> simTime;
std::atomic<int> tocalc;
int end = 10000;

void threadFunction(int n);

int main() {
  int nthreads = 4;
  std::thread threads[nthreads];
  for (int ii = 0; ii < nthreads; ii ++) {
    threads[ii] = std::thread(threadFunction, ii+1);
  }

  simTime = 0;
  tocalc = 0;
  while (simTime < end) {
    tocalc = nthreads - 1;
    simTime += 1;
    // do calculation
    while (tocalc > 0) {
      // wait until all the threads have done their calculation
      // or at least checked to see if they need to
    }
  }

  for (int ii = 0; ii < nthreads; ii ++) {
    threads[ii].join();
  }
}

void threadFunction(int n) {
  int prev = simTime;
  int fix = prev;
  int ncalcs = 0;
  while (simTime < end) {
    if (simTime - prev > 0) {
      prev = simTime;
      if (simTime - fix >= n) {
        // do calculation
        ncalcs ++;
        fix = simTime;
      }
      tocalc --;
    }
  }
  std::cout << std::to_string(n)+" {ncalcs} - "+std::to_string(ncalcs)+"\n";
}

However, the output is not consistent with that expectation, one possible output is

2 {ncalcs} - 4992
1 {ncalcs} - 9983
3 {ncalcs} - 3330
4 {ncalcs} - 2448

While the expected output is

2 {ncalcs} - 5000
1 {ncalcs} - 10000
3 {ncalcs} - 3333
4 {ncalcs} - 2500

I am wondering if anyone has insight as to why this method of forcing the threads to wait for the next step seems to be failing - if it is perhaps a simple issue with my code or if it is a more fundamental problem with the approach. Any insight is appreciated, thanks.

Note

I am using this approach because the overhead for other methods I have tried (e.g. using pipes, joining at each step) is prohibitively expensive, if there is a less expensive way of communicating between the threads I am open to such suggestions.

You are setting `tocalc` wrong in the main thread, you want to wait until `nthreads` have decremented to zero, not `nthreads-1`. — , Nov 27 '18 at 01:13
If I change the line `tocalc = nthreads - 1` to `tocalc = nthreads` then it hangs indefinitely, and a debug statement in the main thread's wait loop shows `tocalc` is stuck at 1. Any thoughts? — William Miller, Nov 27 '18 at 01:36
what is the value of simtime when you read it before you set it to 0? — xaxxon, Nov 27 '18 at 02:05
You are not initializing the globals early enough and there are multiple points where the synchronization isn't ensured. For example you need to wait for all threads to have set their `prev` before increasing `simTime` the first time around, otherwise one of the threads will get stuck. Then in each iteration you need to wait for both `simTime` and `tocalc` to be set before doing a calculation in the thread. And so on. — , Nov 27 '18 at 02:12

score 1 · Accepted Answer · edited Apr 14 '20 at 04:06

To expand on the comments, initializing tocalc to nthreads - 1 means that on some of the iterations all the child threads will decrement tocalc before the parent thread evaluates it - the reads and writes to an atomic are handled by the memory scheduler. So sometimes the sequence could go

Child 1 decrements tocalc, new value is 2
Child 3 decrements tocalc, new value is 1
Child 4 decrements tocalc, new value is 0
Child 2 decrements tocalc, new value is -1
Parent evaluates if tocalc > 0, returns false - simulation advances

and other times the parent evaluation could be scheduled before the last thread decrements tocalc, i.e.

Child 1 decrements tocalc, new value is 2
Child 3 decrements tocalc, new value is 1
Child 4 decrements tocalc, new value is 0
Parent evaluates if tocalc > 0, returns false - simulation advances
Child 2 decrements tocalc, new value is 2

in which case child thread number 2 will miss an iteration. Since this doesn't happen every time due to the semi-randomness of the scheduling order the total number of misses is not a linear function of the number of threads, but some small fraction of the total iterations. If you modify the code to the below it will produce the desired result.

#include <thread>
#include <iostream>
#include <atomic>

std::atomic<int> simTime;
std::atomic<int> tocalc;
int end = 10000;

void threadFunction(int n);

int main() {
    int nthreads = 4;
    simTime = 0;
    tocalc = 0;
    std::thread threads[nthreads];
    for (int ii = 0; ii < nthreads; ii ++) {
        threads[ii] = std::thread(threadFunction, ii+1);
    }

    int wait = 0;
    while (simTime < end) {
        tocalc = nthreads;
        simTime += 1;
        // do calculation
        while (tocalc > 0) {
            // wait until all the threads have done their calculation
            // or at least checked to see if they need to
        }
    }
    for (int ii = 0; ii < nthreads; ii ++) {
        threads[ii].join();
    }
}

void threadFunction(int n) {
    int prev = 0;
    int fix = prev;
    int ncalcs = 0;
    while (simTime < end) {
        if (simTime - prev > 0) {
            prev = simTime;
            if (simTime - fix >= n) {
                // do calculation
                ncalcs ++;
                fix = simTime;
            }
            tocalc --;
        }
    }
    std::cout << std::to_string(n)+" {ncalcs} - "+std::to_string(ncalcs)+"\n";
}

And one possible output would be (order of thread completion is somewhat random)

2 {ncalcs} - 5000
3 {ncalcs} - 3333
1 {ncalcs} - 10000
4 {ncalcs} - 2500

score 0 · Answer 2 · answered Dec 01 '18 at 00:02

Using a similar setup I noticed that not every thread will reach the number you expect it to, but only be off by one. i.e.

2 {ncalcs} - 4999
4 {ncalcs} - 2500
1 {ncalcs} - 9999
3 {ncalcs} - 3333

Or the like, seemingly random with regard to thread and number of threads for which it happens. Though I'm not sure what is causing it I thought it might be good to issue a warning, you can get around it by checking if simTime - fix == 0 and if it isn't then do another calculation before quitting.

Synchronizing child threads to atomic time managed by parent

Note

2 Answers2