Diagnosis
The reason you get practically the same time — and slightly more time from the threaded code than from the sequential code — is that clock()
measures CPU time, and the two ways of sorting take almost the same amount of CPU time because they're doing the same job (and the threading number is probably slightly bigger because of the time to setup and tear down threads).
The clock()
function shall return the implementation's best approximation to the processor time used by the process since the beginning of an implementation-defined era related only to the process invocation.
BSD (macOS) man page:
The clock()
function determines the amount of processor time used since the invocation of the calling process, measured in CLOCKS_PER_SECs of a second.
The amount of CPU time it takes to sort the two arrays is basically the same; the difference is the overhead of threading (more or less).
Revised code
I have a set of functions that can use clock_gettime()
instead (code in timer.c
and timer.h
at GitHub). These measure wall clock time — elapsed time, not CPU time.
Here's a mildly tweaked version of your code — the substantive changes were changing the type of key
in the sort function from int
to unsigned int
to match the data in the array, and to fix the conversion specification of %d
to %ld
to match the type identified by GCC as clock_t
. I mildly tweaked the argument handling, and the timing messages so that they're consistent in length, and added the elapsed time measurement code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
#include "timer.h"
struct thread_data
{
int count;
unsigned int *arr;
};
struct thread_data thread_data_array[2];
static
void insertionSort(unsigned int arr[], int n)
{
for (int i = 1; i < n; i++)
{
unsigned int key = arr[i];
int j = i - 1;
while (j >= 0 && arr[j] > key)
{
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}
static
void *sortAndMergeArrays(void *threadarg)
{
int count;
unsigned int *arr;
struct thread_data *my_data;
my_data = (struct thread_data *)threadarg;
count = my_data->count;
arr = my_data->arr;
insertionSort(arr, count);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
int count = 10000;
int i, rc;
clock_t start, end, total_t;
pthread_t threads[2];
// get the loop count. If loop count is not provided take 10000 as default loop count.
if (argc == 2)
count = atoi(argv[1]);
unsigned int arr1[count], arr2[count], copyArr1[count], copyArr2[count];
srand(time(0));
for (i = 0; i < count; i++)
{
arr1[i] = rand();
arr2[i] = rand();
copyArr1[i] = arr1[i];
copyArr2[i] = arr2[i];
}
Clock clk;
clk_init(&clk);
start = clock();
clk_start(&clk);
for (int t = 0; t < 2; t++)
{
thread_data_array[t].count = count;
if (t == 0)
thread_data_array[t].arr = arr1;
else
thread_data_array[t].arr = arr2;
rc = pthread_create(&threads[t], NULL, sortAndMergeArrays, (void *)&thread_data_array[t]);
if (rc)
{
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
pthread_join(threads[0], NULL);
pthread_join(threads[1], NULL);
clk_stop(&clk);
end = clock();
char buffer[32];
printf("Elapsed using threads: %s s\n", clk_elapsed_us(&clk, buffer, sizeof(buffer)));
total_t = (double)(end - start);
printf("CPU time using threads: %ld\n", total_t);
start = clock();
clk_start(&clk);
insertionSort(copyArr1, count);
insertionSort(copyArr2, count);
clk_stop(&clk);
end = clock();
printf("Elapsed sequentially: %s s\n", clk_elapsed_us(&clk, buffer, sizeof(buffer)));
total_t = (double)(end - start);
printf("CPU time sequentially: %ld\n", total_t);
return 0;
}
Results
Example runs (program inssortthread23
) — run on a MacBook Pro (15" 2016) with 16 GiB RAM and 2.7 GHz Intel Core i7 CPU, running macOS High Sierra 10.13, using GCC 7.2.0 for compilation.
I had routine background programs running — e.g. browser not being actively used, no music or videos playing, no downloads in progress etc. (These things matter for benchmarking.)
$ inssortthread23 100000
Elapsed using threads: 1.060299 s
CPU time using threads: 2099441
Elapsed sequentially: 2.146059 s
CPU time sequentially: 2138465
$ inssortthread23 200000
Elapsed using threads: 4.332935 s
CPU time using threads: 8616953
Elapsed sequentially: 8.496348 s
CPU time sequentially: 8469327
$ inssortthread23 300000
Elapsed using threads: 9.984021 s
CPU time using threads: 19880539
Elapsed sequentially: 20.000900 s
CPU time sequentially: 19959341
$
Conclusions
Here, you can clearly see that:
- The elapsed time is approximately twice as long for the non-threaded code as for the threaded code.
- The CPU time for the threaded and non-threaded code is almost the same.
- The overall time is quadratic in the number of rows sorted.
All of which are very much in line with expectations — once you realize that clock()
is measuring CPU time, not elapsed time.
Minor puzzle
You can also see that I'm getting the threaded CPU time as slightly smaller than the CPU time for sequential sorts, some of the time. I don't have an explanation for that — I deem it 'lost in the noise', though the effect is persistent:
$ inssortthread23 100000
Elapsed using threads: 1.051229 s
CPU time using threads: 2081847
Elapsed sequentially: 2.138538 s
CPU time sequentially: 2132083
$ inssortthread23 100000
Elapsed using threads: 1.053656 s
CPU time using threads: 2089886
Elapsed sequentially: 2.128908 s
CPU time sequentially: 2122983
$ inssortthread23 100000
Elapsed using threads: 1.058283 s
CPU time using threads: 2093644
Elapsed sequentially: 2.126402 s
CPU time sequentially: 2120625
$
$ inssortthread23 200000
Elapsed using threads: 4.259660 s
CPU time using threads: 8479978
Elapsed sequentially: 8.872929 s
CPU time sequentially: 8843207
$ inssortthread23 200000
Elapsed using threads: 4.463954 s
CPU time using threads: 8883267
Elapsed sequentially: 8.603401 s
CPU time sequentially: 8580240
$ inssortthread23 200000
Elapsed using threads: 4.227154 s
CPU time using threads: 8411582
Elapsed sequentially: 8.816412 s
CPU time sequentially: 8797965
$