Parallelize a Function using openMP in C

Question

I wrote a program which inputs matrix size and number of threads and then generated a random binary matrix of 0's and 1's. Then I need to find clusters of 1's and give each cluster a unique number.

I am getting the output correctly but I am having a problem parallelizing the function.

My professor asked me to break the matrix rows into "thread_cnt" parts. i.e.: thread size is 4 and matrix size is 8 then it breaks into 4 matrices having 2 rows each.

The code is as follows:

//Inputted Matrix size n and generated a binary matrix rand1[][]
//
begin = omp_get_wtime();
width = n/thread_cnt;
#pragma omp parallel num_threads(thread_cnt) for
for(d=0;d<n;d=d++)
{
    b=d+width;
    Mat(d,b);
    d=(d-1)+width;    
}

Mat(int w,int x)
{
//printf("\n Entered function\n");
for(i=w;i<x;i++)
{    
    for(j=0;j<n;j++)
    {
        //printf("\n Entered the loop also\n");
        //printf("i = %d, j = %d\n",i,j);
        if(rand1[i][j]==1)
        {
            rand1[i][j]=q;
            adj(i,j,q);
            q++;
        }
    }
}
}

adj(int p, int e, int m)            //Function to find adjacent 1's 
{   
//printf("\n Entered adj function\n");
//printf("\n p = %d e = %d m = %d\n",p,e,m);
if (rand1[p][e+1] == 1)
{
    //printf("Test1\n");
    rand1[p][e+1]=m;
    adj(p,e+1,m);
}
if (rand1[p+1][e] == 1)
{
    rand1[p+1][e]=m;        
    //printf("Test2\n");
    adj(p+1,e,m);
}
if (rand1[p][e-1] == 1 && e-1>=0)
{
    rand1[p][e-1]=m;
    //printf("Test3\n");
    adj(p,e-1,m);

}
if (p-1>=0 && rand1[p-1][e] == 1)
{
    rand1[p-1][e]=m;
    //printf("Test4\n");
    adj(p-1,e,m);
}

}

The code gives me correct output. But the time increases instead of decreasing when I increase the number of threads. For 1 thread I get 0.000076 and for 2 threads I get 0.000136.

It looks like its iterating instead of parallelizing. Can anyone help me out on this?

PS: I need to show both Serial time and parallel time and show that I have got a performance increase because of parallelization.

your loop looks weird. and why are you setting a custom number of threads? openmp is designed to create the optimal amount of threads for you. — Andreas Grapentin, Jan 26 '13 at 20:06
How do i do that? Sorry, im good in C but im a novice in openMP — user2014179, Jan 26 '13 at 20:13
you just use `#pragma omp parallel for` and openmp will magically work out everything else (except synchronization) — Andreas Grapentin, Jan 26 '13 at 20:26
and you should probably use a bigger example for timing. smaller examples tend to have weird timing behaviour, because of the constant thread creation overhead — Andreas Grapentin, Jan 26 '13 at 20:27
If i use just #pragma omp parallel then how would it break my matrix into parts? How do i need to rewite my code? — user2014179, Jan 26 '13 at 20:34
Well, you should break the matrix into parts yourself. usually, pragma omp parellel for works in that way, that it executes the iterations of a loop in parallel. so, you need to define a sequential loop that splits the work to be done up into distinct parts, and then let the parallel computation engine work out the details. — Andreas Grapentin, Jan 26 '13 at 20:44
I dont know how many threads the omp would generate. So i cant break the matrix myself without knowing how many threads are gonna be generated. Its so confusing. — user2014179, Jan 26 '13 at 20:58
then try a pthreads version first, to get the idea how parallel coding works, then try openmp. it's not that hard, once you get the general idea :) — Andreas Grapentin, Jan 26 '13 at 21:02
Your recursive algorithm doesn't stop at the boundary between two subblock which belong to different threads. Why not implement the Hoshen-Kopelman algorithm instead? — Hristo Iliev, Jan 27 '13 at 14:58

score 0 · Answer 1 · answered Jan 26 '13 at 20:36

The reason why time increases when thread number increases is that each thread is executing the first loop. It seems that, you don't give the submatrixes into threads, instead each threads operates on every submatrix i.e all matrix. To make threads work on matrix seperately you should use their unique tid number which you can get with this line :

 tid = omp_get_thread_num();

Then you should make a simple mapping : if pid is i operate on (i+1)th submatrix where 0<=i<=nthreads-1 Which can possibly be coded as:

Mat(i*width,i*width+width)

thats not correct. each thread will compute an **iteration** of the loop. — Andreas Grapentin, Jan 26 '13 at 20:47

Parallelize a Function using openMP in C

1 Answers1