Sequence Alignment problem using Pthreads

Question

I am trying to implement Sequence alignment problem (Needleman-Wunsch algorithm) using p-Threads. I am confused how to map the concept of multi threading in this particular serial problem. The code of serial computation of sequence alignment is enclosed below.(Just Matrix calculation part)

#include<iostream>
#include<string.h>
#include<string>
#include<algorithm>

using namespace std;


class Matrix
{
private:
int x;
int y;
int** mat;
string gene1;
string gene2;
int match_penalty;
int mismatch_penalty;
int gap_penalty;
int minimum_penalty;

public:
Matrix(int gene1Len, int gene2Len)
{
    x = gene2Len + 1;           //gene2 length
    y = gene1Len + 1;   //gene 1 length;
    mat = new int* [x];
    for (int i = 0; i < x; i++)
        mat[i] = new int[y];

    for (int i = 0; i < x; ++i) {
        for (int j = 0; j < y; ++j) {
            mat[i][j] = 0;
        }
    }

    //Default Penalties
    match_penalty = 1;
    mismatch_penalty = 3;
    gap_penalty = 2;
    minimum_penalty=0;
}

void Print_Matrix()
{
      cout<<"\n";
    for (int i = 0; i < x; i++)
    {
        for (int j = 0; j < y; j++)
        {
            cout << mat[i][j] << "\t";
        }
        cout << "\n";
    }
}

void setGenes(string gene1, string gene2)
{
    this->gene1 = gene1;
    this->gene2 = gene2;
}

void setPenalty(int mismatch, int gp,int match)
{
    mismatch_penalty = mismatch;
    gap_penalty = gp;
    match_penalty=match;
}

void setMatrix()
{
    //1st row and 1st Column values
    for (int i = 0; i < x; i++)
    {
        mat[i][0] = i * gap_penalty;
    }

    for (int i = 0; i < y; i++)
    {
        mat[0][i] = i * gap_penalty;
    }

    // Other matrix values

    for (int i = 1; i < x; i++)
    {
        for (int j = 1; j < y; j++)
        {
            if (gene1[j - 1] == gene2[i - 1])    //Similar gene values (A==A ||T==T)
            {
                mat[i][j] = mat[i - 1][j - 1]+match_penalty;
            }
            else
            {
                mat[i][j] = max({ mat[i - 1][j - 1] + mismatch_penalty , mat[i - 1][j] + gap_penalty, mat[i][j - 1] + gap_penalty });
            }
        }
    }


}


};


int main()
{
 string gene1 = "ACCA";
 string gene2 =  "CCA";      
 int matchPenalty;
 int misMatchPenalty ;
 int gapPenalty ;
 cout<<"Enter the value of Match Penalty" << endl;
cin >>  matchPenalty;
cout<<"Enter the value of misMatchPenalty Penalty" << endl;
cin >>  misMatchPenalty;

cout<<"Enter the value of gapPenalty Penalty" << endl;
cin >>  gapPenalty;


    
Matrix dp(gene1.length(), gene2.length());
dp.setGenes(gene1, gene2);
dp.setPenalty(misMatchPenalty, gapPenalty,matchPenalty);
dp.setMatrix();
dp.Print_Matrix();

}

How can I implement the above problem in P-threads? So far, I have used two threads to calculate matrix values of 1st column and 1st row simultaneously.But I have no idea how to compute all values of matrix in parallel. Kindly see my source code:

#include<string>
#include<algorithm>
#include<iostream>
#include<string.h>
#include<pthread.h>
using namespace std;

//Global variables --shared by all threads
int matchPenalty;
int misMatchPenalty;   
int gapPenalty;

struct gene_struct {
string gene1;
string gene2;
int rowSize;
int colSize;
int **mat;
};


 void* set_matrix_row(void *args)
{
 struct gene_struct *shared_block = (struct gene_struct *) args;
 for (int i = 0; i < shared_block->rowSize; i++)
 {
 shared_block->mat[i][0] = i * gapPenalty;
 }

 return NULL;
 }


 void *set_matrix_column(void *args)
 {
 struct gene_struct *shared_block = (struct gene_struct *) args;
 for (int i = 0; i < shared_block->colSize; i++)
 {
   shared_block->mat[0][i] = i * gapPenalty;
 }


 return NULL;
}

 void set_Matrix_Diagnol(struct gene_struct shared_block)
 {
  //How Should I calculate rest of the matrix values using Pthreads?
 }

 void Print_Matrix(struct gene_struct shared_block)
 {
      cout<<"\n";
    for (int i = 0; i < shared_block.rowSize; i++)
    {
        for (int j = 0; j < shared_block.colSize; j++)
        {
            cout << shared_block.mat[i][j] << "\t";
        }
        cout << "\n";
    }
 }

int main()
{
    pthread_t ptid1, ptid2;
    string gene1, gene2;
    struct gene_struct shared_block;   
    cout << "Enter First Gene : ";
    cin >> gene1;
    cout << "Enter Second Gene : ";
    cin >> gene2;
    cout<<"Enter the value of Match Penalty" << endl;
    cin >>  matchPenalty;   
    cout<<"Enter the value of misMatchPenalty Penalty" << endl;
    cin >>  misMatchPenalty;    
    cout<<"Enter the value of gapPenalty Penalty" << endl;
    cin >>  gapPenalty;

    shared_block.gene1 = gene1;
    shared_block.gene2 = gene2;
    shared_block.rowSize = gene2.length()+1;
    shared_block.colSize = gene1.length()+1; 
    shared_block.mat = new int* [shared_block.rowSize];  //col = gene2+1
    for (int i = 0; i < shared_block.rowSize; i++)
    {
    shared_block.mat[i] = new int[shared_block.colSize];
         
 }
    
    pthread_create(&ptid1, NULL, &set_matrix_row, (void *)&shared_block);
    pthread_create(&ptid2, NULL ,&set_matrix_column, (void *)&shared_block);
    pthread_join(ptid1,NULL); 
    pthread_join(ptid2,NULL);
    
    Print_Matrix(shared_block);
 }

It depends on the algorithm you want to implement. If your matrix computation can be done independently by row or column, run one thread per row or column. If the computation allows the matrix to be split into sub-matrices, then split them and give each thread a sub-matrix to work on, etc ... Usually, you first determine how you can mathematically "split" the computations and then you implement the parallelization — A. Gille, May 07 '21 at 10:49
If you expect others to read your code, you should at least spend the time to indent it properly. — James Z, May 07 '21 at 14:08
From a quick peek, the calculation for `mat[i][j]` is dependent upon `mat[i-1][j-1], mat[i][j-1] and mat[i-1][j]`. This is bad for parallel speedup, since each value is dependent upon a set of predecessors, which must be computed first. — mevets, May 07 '21 at 14:19
Do you perhaps want to perform multiple sequence alignments per program run? You would likely find it much easier to use threads to perform multiple independent alignments in parallel than to parallelize individual alignments. — John Bollinger, May 07 '21 at 17:51
@mevets Exactly, I think multi-threading will not be a good choice for this problem set. — Sara Sameer, May 08 '21 at 06:02
@JohnBollinger How can I perform independent alignments? The values in matrix is calculated by adding the relevant penalty in its corresponding matrix cell. Each value is dependent upon mat[i-1][j-1], mat[i][j-1] and mat[i-1][j]. Is there any value I can split it to perform independent alignments? — Sara Sameer, May 08 '21 at 06:05

Sequence Alignment problem using Pthreads

0 Answers0