Measuring speed up of a multi threaded C program (implementation using Pthreads)

Question

I currently have a multi-threaded C program coded using Pthreads which uses 2 threads. I want to increase the no. of threads and measure speed up upon doing so. I would like to run my code in an automated manner where the no. of threads used keeps getting incremented and I want to graphically display running times of my code. I would love it if I could get a clue in on how to do so especially on how to automate the entire process and plotting it graphically. Here is my code:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_THREADS 2
#define VECTOR_SIZE 40

struct DOTdata
{
    /* data */
    long X[VECTOR_SIZE];
    long Y[VECTOR_SIZE];
    long sum;
    long compute_length;
};

struct DOTdata dotstr;
pthread_mutex_t mutex_sum;

void *calcDOT(void *);

int main(int argc, char *argv[])
{
    long vec_index;

    for(vec_index = 0 ; vec_index < VECTOR_SIZE ; vec_index++){
        dotstr.X[vec_index] = vec_index + 1;
        dotstr.Y[vec_index] = vec_index + 2; 
    }

    dotstr.sum = 0;
    dotstr.compute_length = VECTOR_SIZE/NUM_THREADS;

    pthread_t call_thread[NUM_THREADS];
    pthread_attr_t attr;
    void *status;

    pthread_mutex_init(&mutex_sum, NULL);

    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    long i;

    for(i = 0 ; i < NUM_THREADS ; i++){
        pthread_create(&call_thread[i], &attr, calcDOT, (void *)i);
    }

    pthread_attr_destroy(&attr);

    for (i = 0 ; i < NUM_THREADS ; i++){
        pthread_join(call_thread[i], &status);
    }

    printf("Resultant X*Y is %ld\n", dotstr.sum);
    pthread_mutex_destroy(&mutex_sum);
    pthread_exit(NULL);
}

void *calcDOT(void *thread_id)
{
    long vec_index;
    long start_index;
    long end_index;
    long length;
    long offset;
    long sum = 0;

    offset = (long)thread_id;
    length = dotstr.compute_length;

    start_index = offset * length;
    end_index = (start_index + length) - 1;

    for(vec_index = start_index ; vec_index < end_index ; vec_index++){
        sum += (dotstr.X[vec_index] * dotstr.Y[vec_index]);
    }

    pthread_mutex_lock(&mutex_sum);
    dotstr.sum += sum;
    pthread_mutex_unlock(&mutex_sum);

    pthread_exit((void *)thread_id);

}

I would like to increment my NUM_THREADS parameter and run it after each increment, record the execution time after each increment and plot a graph of execution time vs number of threads.

Which OS? And to make it clear, have you already managed to make your application work with an arbitrary number of threads or you're also seeking help for that? — giusti, Jan 01 '17 at 13:04
I have written my code which uses n threads although a constant throughout its execution cycle. My OS is macOS Sierra. — Abhijeet Mohanty, Jan 01 '17 at 13:11
I can vary my NUM_THREADS parameter and run it separately, but I want to automate the process, record the execution time of each cycle and graphically display it. — Abhijeet Mohanty, Jan 01 '17 at 13:19
You can modify your code to take the number of threads from the command-line or standard input. Then you can write a script to call your program several times. I'm not versed in Mac, but I think you can use Bash and `time` to measure your program. — giusti, Jan 01 '17 at 13:26
Or instead of modifying your program, an easier solution would be to make your script supply `NUM_THREADS` to the compiler as a macro. In GCC you can use the option `-D` for that. Write a script that will compile your program with different numbers of threads, run it, and measure the running time. — giusti, Jan 01 '17 at 13:28
If this isn't enough to help you find your way, I think your question would be better received in either the Unix or the "Ask Different" community. Check them both to see which one would suit you better. — giusti, Jan 01 '17 at 13:29
this line: `dotstr.compute_length = VECTOR_SIZE/NUM_THREADS;` will not work as expected. This is because it is using an integer divide, to any remainder will be truncated. So this will only work correctly when NUM_THREADS exactly divides into VECTOR_SIZE. When NUM_THREADS is larger than VECTOR_SIZE, the number of threads will be 0 — user3629249, Jan 01 '17 at 16:36
the posted code can eliminate all the `attr` statements and just pass NULL as the second parameter to pThreadCreate() — user3629249, Jan 01 '17 at 16:46
regarding this line: `pthread_create(&call_thread[i], &attr, calcDOT, (void *)i);` in general, the last parameter should be the address of the data, not casting the data as if it were a pointer. Suggest: `pthread_create(&call_thread[i], &attr, calcDOT, (void *)&i); Then this line: `offset = (long)thread_id;` should be: `offset = (long)(*thread_id);` — user3629249, Jan 01 '17 at 16:52
the posted code is not using the returned value from the threads, to avoid clutter, suggest using: `pthread_exit( NULL );` and `pthread_join( call_thread[i], NULL );` — user3629249, Jan 01 '17 at 16:59
the posted code needs to be checking the returned value from: `pthread_create()` `pthread_join()` to assure they were successful. — user3629249, Jan 01 '17 at 17:14
this line: `pthread_mutex_t mutex_sum;` would be better written as: `pthread_mutex_t mutex_sum = PTHREAD_MUTEX_INITIALIZER;` and eliminate the call to `pthread_mutex_init()` — user3629249, Jan 01 '17 at 17:15
an important thing to note: breaking the calculation into numerous threads will SLOW DOWN the program due to context switching, etc. — user3629249, Jan 01 '17 at 17:20
`(void *)i` this results in implementation defined behaviour. Didn't you notice some warning during compilation? — babon, Jan 01 '17 at 17:42

score 2 · Answer 1 · answered Jan 01 '17 at 21:19

I tried a naive approach by increasing the number of threads, timing it with time.h and plotting it with gnuplot. Each iteration we double the number of threads and we print the time for an iteration. We use gnuplot to display a graph with number of threads on the x-axis and execution time on the y-axis

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define NUM_THREADS 2
#define VECTOR_SIZE 40

struct DOTdata {
    /* data */
    long X[VECTOR_SIZE];
    long Y[VECTOR_SIZE];
    long sum;
    long compute_length;
};

struct DOTdata dotstr;
pthread_mutex_t mutex_sum;

void *calcDOT(void *);

int main(int argc, char *argv[]) {
    double xvals[VECTOR_SIZE / NUM_THREADS];
    double yvals[VECTOR_SIZE / NUM_THREADS];
    int index = 0;
    for (int count = NUM_THREADS; count < VECTOR_SIZE / NUM_THREADS; count = count * 2) {

        clock_t begin = clock();

        long vec_index;

        for (vec_index = 0; vec_index < VECTOR_SIZE; vec_index++) {
            dotstr.X[vec_index] = vec_index + 1;
            dotstr.Y[vec_index] = vec_index + 2;
        }

        dotstr.sum = 0;
        dotstr.compute_length = VECTOR_SIZE / count;

        pthread_t call_thread[count];
        pthread_attr_t attr;
        void *status;

        pthread_mutex_init(&mutex_sum, NULL);

        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

        long i;

        for (i = 0; i < count; i++) {
            pthread_create(&call_thread[i], &attr, calcDOT, (void *) i);
        }

        pthread_attr_destroy(&attr);

        for (i = 0; i < count; i++) {
            pthread_join(call_thread[i], &status);
        }

        printf("Resultant X*Y is %ld\n", dotstr.sum);
        pthread_mutex_destroy(&mutex_sum);
        clock_t end = clock();
        double time_spent = (double) (end - begin) / CLOCKS_PER_SEC;

        printf("time spent: %f NUM_THREADS: %d\n", time_spent, count);
        xvals[index] = count;
        yvals[index] = time_spent;
        index++;
    }

    FILE * gnuplotPipe = popen ("gnuplot -persistent", "w");

    fprintf(gnuplotPipe, "plot '-' \n");

    for (int i = 0; i < VECTOR_SIZE / NUM_THREADS; i++)
    {
        fprintf(gnuplotPipe, "%lf %lf\n", xvals[i], yvals[i]);
    }

    fprintf(gnuplotPipe, "e");


    pthread_exit(NULL);
}

void *calcDOT(void *thread_id) {
    long vec_index;
    long start_index;
    long end_index;
    long length;
    long offset;
    long sum = 0;

    offset = (long) thread_id;
    length = dotstr.compute_length;

    start_index = offset * length;
    end_index = (start_index + length) - 1;

    for (vec_index = start_index; vec_index < end_index; vec_index++) {
        sum += (dotstr.X[vec_index] * dotstr.Y[vec_index]);
    }

    pthread_mutex_lock(&mutex_sum);
    dotstr.sum += sum;
    pthread_mutex_unlock(&mutex_sum);

    pthread_exit((void *) thread_id);

}

Output

Resultant X*Y is 20900
time spent: 0.000155 NUM_THREADS: 2
Resultant X*Y is 19860
time spent: 0.000406 NUM_THREADS: 4
Resultant X*Y is 17680
time spent: 0.000112 NUM_THREADS: 8
Resultant X*Y is 5712
time spent: 0.000587 NUM_THREADS: 16

Measuring speed up of a multi threaded C program (implementation using Pthreads)

1 Answers1