0

I am writing a program that will match up one block(a group of 4 double numbers which are within certain absolute value) with another.

Essentially, I will call the function in main.

The matrix has 4399 rows and 500 columns.I am trying to use OpenMp to speed up the task yet my code seems to have race condition within the innermost loop (where the actual creation of block happens create_Block(rrr[k], i); ).

It is ok to ignore all the function detail as they are working well in serial version. The only focus here is the OpenMP derivatives.

int main(void) {


     readKey("keys.txt");
    double** jz = readMatrix("data.txt");
    int j = 0;
    int i = 0;
    int k = 0;



#pragma omp parallel for firstprivate(i) shared(Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b)
    for (i = 0; i < 50; i++) {  

        printf("THIS IS COLUMN %d\n", i);

        double*c = readCol(jz, i, 4400);



#pragma omp parallel for firstprivate(j) shared(i,Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b) 
        for (j=0; j < 4400; j++) {

            // printf("This is fixed row %d from column %d !!!!!!!!!!\n",j,i);
            int* one_collection = collection(c, j, 4400);

            // MODIFY THE DYMANIC ALLOCATION OF SPACES (SIZE_OF_COMBINATION) IN combNonRec() function.

            if (get_combination_size(SIZE_OF_COLLECTION, M) >= 4) {
                //GET THE 2D-ARRAY OF COMBINATION
                int** rrr = combNonRec(one_collection, SIZE_OF_COLLECTION, M);

#pragma omp parallel for firstprivate(k) shared(i,j,Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b) 
                for (k = 0; k < get_combination_size(SIZE_OF_COLLECTION, M); k++) {
                    create_Block(rrr[k], i);   //ACTUAL CREATION OF BLOCK !!!!!!!       
                    printf("This is block %d \n", NUM_OF_BLOCK);
                    add_To_Block_Collection();

                }

                free(rrr);
            }


            free(one_collection);
        }

        //OpenMP for j
        free(c);

    }
    // OpenMP for i

    collision();
    }

Here is the parallel version result: non-deterministic enter image description here

Whereas the serial result has constant 400 blocks.

enter image description here

Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION are global variable.

Did I do anything wrong in the derivative declaration? What might have caused such problem?

Zulan
  • 21,896
  • 6
  • 49
  • 109
bslqy
  • 233
  • 2
  • 12
  • It is **not** ok to ignore all the fuction detail. For instance does `add_To_Block_Collection` work on global state and thus may be involved in the race condition? Please include text as text, and not image. Also mind that MPI is very different than OpenMP. I assume you mean *directive* (or *pragma*) instead of *derivate*. Three nested `parallel for` regions [don't do what you might think they do](http://stackoverflow.com/a/10541145/620382). – Zulan Oct 24 '16 at 18:33
  • @Zulan Yes. The **add_To_Block_Collection()** does modify the array in a global state as there is only one placeholder for the creation of one block element. I have seen ur link, it says that the number of thread is multiplying for each loop. I try to put **num_of_thread(1)** in two inner loop in the hope that they might use the same number of thread.. It fails. So perhaps I should not nested too many loops? – bslqy Oct 25 '16 at 01:38
  • Are you intending to use nested parallelism? Did you set OMP_NESTED=true in the environment? If so you likely have N**2 over-subscription (and poor performance). If not, then you don't have nested parallelism anyway, so take out the inner parallel for. – Jim Cownie Oct 25 '16 at 08:30
  • There is no one-size-fits-all synchronization. Now the literal answer to your question would be `#pragma omp critical`, but that may just kill your performance. To get a specific answer, you need to provide details of those placeholders. As for the choice of which loop to parallelize it is again difficult without details, but for starting I would select **one** of the outer two loops. – Zulan Oct 25 '16 at 11:33

0 Answers0