0

I'm writing a parallel version of the Longest Common Subsequence algorithm using openMP.

The sequential version is the following (and it works correctly):

// Preparing first row and first column with zeros
for(j=0; j < (len2+1); j++)
    score[0][j] = 0;

for(i=0; i < (len1+1); i++)
    score[i][0] = 0;

// Calculating scores
for(i=1; i < (len1+1); i++) {
    for(j=1; j < (len2+1) ;j++) {
        if (seq1[i-1] == seq2[j-1]) {
               score[i][j] = score[i-1][j-1] + 1;
        }
        else {
            score[i][j] = max(score[i-1][j], score[i][j-1]);
        }
    }
}

The critical part is filling up the score matrix and this is the part I'm trying to mostly parallelize.

One way to do it (which I chose) is: filling up the matrix by anti diagonals, so left, top and top-left dependecies are always satisfied. In a nutshell, I keep track of the diagonal (third loop, variable i below) and threads fill up that diagonal in parallel. For this purpose, I've written this code:

void parallelCalculateLCS(int len1, int len2, char *seq1, char *seq2) {

int score[len1 + 1][len2 + 1];
int i, j, k, iam;
char *lcs = NULL;

for(i=0;i<len1+1;i++)
    for(j=0;j<len2+1;j++)
        score[i][j] = -1;

#pragma omp parallel default(shared) private(iam)
{
    iam = omp_get_thread_num();
// Preparing first row and first column with zeros
    #pragma omp for
    for(j=0; j < (len2+1); j++)
        score[0][j] = iam;

    #pragma omp for
    for(i=0; i < (len1+1); i++)
        score[i][0] = iam;

// Calculating scores
    for(i=1; i < (len1+1); i++) {
        k=i;
        #pragma omp for
        for(j=1; j <= i; j++) {
            if (seq1[k-1] == seq2[j-1]) {
                //  score[k][j] = score[k-1][j-1] + 1;
                score[k][j] = iam;
            }
            else {
            //  score[k][j] = max(score[k-1][j], score[k][j-1]);
                score[k][j] = iam;
            }
            #pragma omp atomic
            k--;
        }
    }

}
}

The first two loops (first row and column) work correctly and threads fill up cells in a balanced way.

When it comes to fill up the matrix (diagonally), nothing works well. I tried to debug it, but it seems that threads act and write things randomly.

I can't figure out what's going wrong, since in the first two loops there were no problems at all.

Any idea?

P.S. I know that accessing matrix in a diagonal way is very cache-unfriendly and threads could be unbalanced, but I only need it to work by now.

P.S. #2 I don't know if it could be useful, but my CPU has up to 8 threads.

Alberto Coletta
  • 1,563
  • 2
  • 15
  • 24

2 Answers2

0

The following nested for loop

 #pragma omp for
 for(j=1; j <= i; j++) 

will be executed in parallel, each thread with a different value of j in no specific order. As nothing is specified in the omp for section, k will be shared by default between all threads. So depending on the order of the threads, k will be decremented at an unknown time (even with the omp atomic). So for a fixed j, the value of k might change during the execution of the body of the for loop (between the if clauses, ...).

A. Lefebvre
  • 139
  • 4
  • Thanks. But if I set k as private, they all have k=i as their initial value, right? How can I make threads modify a cell of the _actual_ diagonal of the matrix? Because if I don't modify k, threads will wirte in the Kth row. I hope the question is clear. – Alberto Coletta Jan 11 '14 at 11:36
0

#pragma omp atomic means that the processors will perform the operation one at a time. You are looking for #pragma omp for private(k) : the processors will no longer share the same value. Bye, Francis

francis
  • 9,525
  • 2
  • 25
  • 41
  • Thanks. But if I set k as private, they all have k=i as their initial value, right? How can I make threads modify a cell of the actual diagonal of the matrix? Because if I don't modify k, threads will wirte in the Kth row. I hope the question is clear. – Alberto Coletta Jan 11 '14 at 11:36
  • You are right. The correct value for `k` must be something like `k=i-j+1`. Try soemthing like : `for(j=1; j <= i; j++) { k=i-j+1;...}` I may be one index too short or too long... – francis Jan 11 '14 at 12:03
  • and watch for the bounds of the for loops since there are more than len1+1 diagonals ! – francis Jan 11 '14 at 12:10
  • That code works on the upper triangle of the matrix, so len1 should be correct. I tried to put `k=i-j+1` and remove `k--`, but it doesn't work. I also tried to completely substitute k with its expression (thus not having the need to declare private variables) but it doesn't work. Why? And how is the outer loop (with i) handled by threads? Is it executed sequentially or not? Thanks again. – Alberto Coletta Jan 11 '14 at 13:20
  • Ok, I tried to do this way: I close the first `omp parallel region`, I write my outer loop and i open a new parallel region just before the inner loop: `for(...i...) #pragma omp parallel for for(...j...)`. It works well now, but I'm a little worried on the performance consequences due to creating and destroying threads continuosly. Is there a way, in a parallel region, to run the outer loop sequentially? – Alberto Coletta Jan 11 '14 at 14:32
  • 1
    Oups...I have not seen the first `#pragma omp parallel`. You corrected it the right way ! – francis Jan 11 '14 at 16:15