Using pthread_join() on multiple threads giving unexpected behavior

Question

I am learning how to use threads in C and have run into a problem when creating the threads. I am making a program that takes in 2 or more file names as command line arguments, counts the number of bytes in each file in their own thread, and then outputs the name of the largest file. When I use pthread_join() directly after creating a thread, the program runs as intended. However, I know this isn't how threads should be used because it defeats the purpose. When I use pthread_join() in a for loop after creating all the threads, then the program does not work correctly. Could anyone tell me what I am doing wrong? All help is appreciated. Here is my main function.

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; //mutex for changing max_bytes and max_name
int max_bytes = 0;
char max_name[100];

struct arg_struct{ //struct to hold args to pass the threads
        int fd;
        char name[100];
};

int main(int argc, char* argv[])
{
        if(argc < 3){ //checks for correct number of arguments passed
                perror("Wrong number of arguments");
                return EXIT_FAILURE;
        }

        int arg_num = argc - 1; //holds number of arguments passed

        pthread_t threadid[arg_num]; //array of thread IDs
        struct arg_struct args;
        for(int i = 0; i < arg_num; i++){
                args.fd = open(argv[i+1], O_RDONLY);
                memcpy(args.name, argv[i+1], sizeof(args.name)); //copies file name into arg_struct
                int thread_err = pthread_create(&threadid[i], NULL, count_bytes, (void*)&args); //create thread by calling count_bytes and passing it a struct of args
                //pthread_join(threadid[i], NULL);
                if(thread_err != 0){
                        perror("pthread_create failed");
                        return EXIT_FAILURE;
                }
        }

        for(int i = 0; i < arg_num; i++){
                pthread_join(threadid[i], NULL);
        }

        printf("%s is the largest of the submitted files\n", max_name);

        return 0;
}

This is the function that the threads are running.

void *count_bytes(void* arguments)
{
        struct arg_struct *args = (struct arg_struct*)arguments; //casting arguments back to struct from void*
        int fd = args -> fd;
        char name[100];
        memcpy(name, args -> name, sizeof(name)); //copies file name into name from args.name
        int bytes = 0;

        int size = 10;
        char*  buffer = (char*) malloc(size);
        if(buffer == NULL){
                perror("malloc failed");
                exit(EXIT_FAILURE);
        }
        int buffer_count = 0;
        for(int i = 0; i < size; i++){
                buffer[i] = '\0'; //sets all elements to '\0' to determine end of file later
        }
        int read_return = read(fd, &buffer[buffer_count], 1);
        if(read_return == -1){
                perror("reading failed");
                exit(EXIT_FAILURE);
        }

        while(buffer[buffer_count] != '\0'){
                bytes++;
                buffer_count++;
                buffer[buffer_count] = '\0'; //sets previous element to '\0' to determine end of file later
                if(buffer_count >= size){
                        buffer_count = 0; //buffer will hold up to 10 elements and then go back to the beginning
                }
                read_return = read(fd, &buffer[buffer_count], 1);
                if(read_return == -1){
                        perror("reading failed");
                        exit(EXIT_FAILURE);
                }
        }

        printf("%s has %d bytes\n", name, bytes);

        pthread_mutex_lock(&mutex);
        if(bytes > max_bytes){
                max_bytes = bytes;
                memcpy(max_name, name, sizeof(max_name));
        }
        //locks mutex to avoid race condition
        //then sets bytes to max_bytes if it is later than max_bytes
        //then locks mutex to allow another thread to have access
        pthread_mutex_unlock(&mutex);

        return NULL;
}

If it is of any use, these are the two outputs produced when it is running correctly

./a.out another buffered_readword.c
another has 8 bytes
buffered_readword.c has 3747 bytes
buffered_readword.c is the largest of the submitted files

And not correctly

./a.out another buffered_readword.c
buffered_readword.c has 1867 bytes
buffered_readword.c has 1881 bytes
buffered_readword.c is the largest of the submitted files

Please tell us what the program is supposed to do and describe how the result is incorrect. Providing the two sets of output is good but we also need to know the overall problem being solved. — kaylum, May 02 '21 at 21:33
@kaylum It is supposed to take in 2 or more files, count the number of bytes in each file in their own thread, and then output which file is the largest. — Madison Allen, May 02 '21 at 21:46
Please [Edit](https://stackoverflow.com/posts/67361299/edit) the post to update it with info like that which is a core part of the question. — kaylum, May 02 '21 at 21:48

score 1 · Accepted Answer · answered May 02 '21 at 21:53

The problem is that there is only one args structure. After pthread_create is called the new thread may not run immediately. By the time the threads run it is likely that they will both see the same args values. Calling pthread_join inside the thread creation loop "fixes" that because it ensures each thread finishes before args is updated to the next value.

To fix properly pass a different args to each thread. Illustrative code to do that:

struct arg_struct args[arg_num];
for(int i = 0; i < arg_num; i++){
    args[i].fd = open(argv[i+1], O_RDONLY);
    memcpy(args[i].name, argv[i+1], sizeof(args[i].name));
    int thread_err = pthread_create(&threadid[i], NULL, count_bytes, &args[i]); 
    ....

You don't actually need to _copy_ `argv[i+1]` -- storing a pointer is perfectly fine. — Employed Russian, May 03 '21 at 03:36

Using pthread_join() on multiple threads giving unexpected behavior

1 Answers1