I have some code that is trying to run some intense matrix processing, so I thought it would be faster if I multithreaded it. However, what my intention is is to keep the thread alive so that it can be used in the future for more processing. Here is the problem, the multithreaded version of the code runs slower than a single thread, and I believe the problem lies with the way I signal/keep my threads alive.
I am using pthreads on Windows and C++. Here is my code for the thread, where runtest() is the function where the matrix calculations happen:
void* playQueue(void* arg)
{
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break;
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_exit(NULL);
}
The playQueue() function is the one passed to the pthread, and what I have as of now, is that there is a queue (testQueue) of lets say 1000 items, and there are 100 threads. Each thread will continue to run until the queue is empty (hence the stuff inside the mutex).
I believe that the reason the multithread runs so slow is because of something called false sharing (i think?) and my method of signaling the thread to call runtest() and keeping the thread alive is poor.
What would be an effective way of doing this so that the multithreaded version will run faster (or at least equally as fast) as an iterative version?
HERE IS THE FULL VERSION OF MY CODE (minus the matrix stuff)
# include <cstdlib>
# include <iostream>
# include <cmath>
# include <complex>
# include <string>
# include <pthread.h>
# include <queue>
using namespace std;
# include "matrix_exponential.hpp"
# include "test_matrix_exponential.hpp"
# include "c8lib.hpp"
# include "r8lib.hpp"
# define NUM_THREADS 3
int main ( );
int counter;
queue<int> testQueue;
queue<int> anotherQueue;
void *playQueue(void* arg);
void runtest();
void matrix_exponential_test01 ( );
void matrix_exponential_test02 ( );
pthread_mutex_t anotherLock;
pthread_mutex_t queueLock;
pthread_cond_t queue_cv;
int main ()
{
counter = 0;
/* for (int i=0;i<1; i++)
for(int j=0; j<1000; j++)
{
runtest();
cout << counter << endl;
}*/
pthread_t threads[NUM_THREADS];
pthread_mutex_init(&queueLock, NULL);
pthread_mutex_init(&anotherLock, NULL);
pthread_cond_init (&queue_cv, NULL);
for(int z=0; z<1000; z++)
{
testQueue.push(1);
}
for( int i=0; i < NUM_THREADS; i++ )
{
pthread_create(&threads[i], NULL, playQueue, (void*)NULL);
}
while(anotherQueue.size()<NUM_THREADS)
{
}
cout << counter;
pthread_mutex_destroy(&queueLock);
pthread_cond_destroy(&queue_cv);
pthread_cancel(NULL);
cout << counter;
return 0;
}
void* playQueue(void* arg)
{
while(true)
{
cout<<counter<<endl;
pthread_mutex_lock(&queueLock);
if(testQueue.empty()){
pthread_mutex_unlock(&queueLock);
break;
}
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_mutex_lock(&anotherLock);
anotherQueue.push(1);
pthread_mutex_unlock(&anotherLock);
pthread_exit(NULL);
}
void runtest()
{
counter++;
matrix_exponential_test01 ( );
matrix_exponential_test02 ( );
}
So in here the "matrix_exponential_tests" are taken from this website with permission and is where all of the matrix math occurs. The counter is just used to debug and make sure all the instances are running.