0

I'm trying to parallelize some iterations over a matrix.

The matrix is saved in a 1D array to have contiguous data in memory:

// array that contains all the elems of dense matrix
char* data;
//set of pointers to the matrix rows indexed by the subarrays of 'data'
char ** dense = NULL;
dense = new char*[m_rows];
data = new char[m_cols*m_rows];

After 'data' has been populated with numbers, I index the matrix in that way:

// index every row of DENSE with a subarray of DATA
char* index = data;
for(int i = 0; i < m_rows; i++)
{
    dense[i] = index;
    // index now points to the next row
    index += m_cols;
}

After that, I parallelize the iteration over the matrix assigning a column to every thread because I have to make computations column by column.

    int th_id;
    #pragma omp parallel for private(i, th_id) schedule(static)
    for(j=0;j<m_cols;++j)
    {
        for(i=0;i<m_rows;++i)
        {
            if(dense[i][j] == 1)
            {
                if(i!=m_rows-1)
                {
                    if(dense[i+1][j] == 0)
                    {
                        dense[i][j] = 0;
                        dense[i+1][j] = 1;
                        i++;
                    }
                }
                else
                {
                    if(dense[0][j] == 0)
                    {
                        dense[i][j] = 0;
                        dense[0][j] = 1;
                    }
                }
            }
        }
    }

I think that I came across the "False sharing" problem in which cache data are invalidated when a matrix cell is written.

How can I solve this problem?

rh0x
  • 1,066
  • 1
  • 13
  • 33
  • 1
    how did you establish that you in fact have false sharing? – TemplateRex Jan 19 '16 at 09:47
  • Without looking at your code or algorithm too much: can you transpose your matrix? – MikeMB Jan 19 '16 at 09:51
  • To be more precise switch from row major to colum major? – MikeMB Jan 19 '16 at 09:54
  • @TemplateRex I'm not 100% sure that the problem exists, but I think it happening – rh0x Jan 19 '16 at 10:14
  • @MikeMB The problem is that I have similar function also cycling row by row. In addition, since I have to gain the best performance possible, I can't transpose the matrix every time. In fact I call this function and the other one previously mentioned, one after the other and more that one time. – rh0x Jan 19 '16 at 10:16
  • This really has to do with a bad memory access pattern than anything else. By accessing your matrix as if you are in Fortran, you effectively are asking to acesss a little bit of your matrix here, a little bit of your matrix here, and a little bit of your matrix here. Nothing is contiguous, so you are going to be killed in performance because of how memory-bound that type of access pattern is. – NoseKnowsAll Jan 19 '16 at 16:11

0 Answers0