0

I'm trying to implement a barrier system for threads using condition variables but there is a caveat where the first thread to arrive must write to a resource to share with the rest of the threads once the barrier fills. The problem is that when I run the program, the broadcast call does not wake most of the threads, I have spent quite a while trying to debug this with no success.

If i removed the call to add_codeword() then then the threads get grouped together perfectly fine and I suspect it has something to do with the fact that add_codeword() contains a wait of about 4 seconds.

void join_meetup(char *value, int len) {

pthread_mutex_lock(&mut);
if(++count < grp_size) {
    if(count == 1) {
        add_codeword(value, len);  //<=== This has a wait() all of 4 seconds.
    }

    int curr_group = group;
    while(curr_group == group) {
        pthread_cond_wait(&queue, &mut);
    }

}else{
    if(meet_ord == MEET_LAST) {
        add_codeword(value, len);
    }

    count = 0; //Reset group counter.
    group++; //Increment the group num

    pthread_cond_broadcast(&queue);

}

read_resource(&code_list[(group-1) % CODE_SIZE], value, len);
pthread_mutex_unlock(&mut);

}

Here is a sample output with the group size set to 3:

Group 1: 1
Group 1: 1
Group 1: 1
INISHED WRITING CODE! 4
Group 2: 4
INISHED WRITING CODE! 7
Group 3: 7
Group 3: 7
Group 3: 7
Group 4: 10
FINISHED WRITING CODE! 13
Group 4: 10
Group 4: 10
Group 5: 13
FINISHED WRITING CODE! 16
Group 5: 13
Group 5: 13
Group 5: 13
Group 5: 13

As you can see, group 2 only has 1 thread and group 5 has 5 threads.

Thanks

  • Can you explain more clearly what you mean by "a wait() all of 4 seconds"? Can you show us what `add_codeword` codes or explain it. (Or, better yet, give us enough code to replicate the problem.) – David Schwartz Jun 17 '17 at 02:08

1 Answers1

0

It looks like you have a race condition in your code. In particular, it looks like you assume that when a thread wakes up from its condition loop, that the variable 'group' will be one higher than when they entered, which is not always true. Imagine what happens when you have very "lazy" threads that take a long while to actually run after they are signaled. When they hit this line:

read_resource(&code_list[(group-1) % CODE_SIZE], value, len);

'group' may have already been updated multiple times by subsequent threads compared to when that particular thread entered. Try this instead:

void join_meetup(char *value, int len) 
{
  pthread_mutex_lock(&mut);

  int my_group = group;

  if (++count < grp_size) 
  {
    if (count == 1)
        add_codeword(value, len);  //<=== This has a wait() all of 4 seconds.

    while (group == my_group)
        pthread_cond_wait(&queue, &mut);    
  }
  else
  {
    if (meet_ord == MEET_LAST)
        add_codeword(value, len);

    count = 0; //Reset group counter.
    group++;   //Increment the group num

    pthread_cond_broadcast(&queue);    
  }

  read_resource(&code_list[my_group % CODE_SIZE], value, len);
  pthread_mutex_unlock(&mut);
}

This should ensure that the threads get the resources meant for their group (ignoring wrap around on your array + modulus). Of course, there is no real guarantee on the order in which they will wake and continue on with whatever else they are doing. I'm also suspicious of this code:

    if (meet_ord == MEET_LAST)
        add_codeword(value, len);

since it looks like your first thread of a new group is responsible for doing add_codeword for the group. So, why is another thread doing it too?

jschultz410
  • 2,849
  • 14
  • 22