1

The following is the code for matrix-vector multiplication of a sparse matrix available in COO format

for (int i=0; i<n; ++i)
    y[i] = 0.0;
for (int i=0; i<nnz; ++i)
    y[row[i]] += val[i]*x[col[i]];

for the sparse matrix given by row, col and val arrays each of size nnz, the number of non zero values in the matrix. When I parallelize the multiplication with openmp, I am doing the following

#pragma omp parallel for default(shared)
for (int i=0; i<nnz; ++i)
#pragma omp atomic
    y[row[i]] += val[i]*x[col[i]];

I am using the atomic construct as I don't have any idea about the order in which row and column values are stored. Hence I should be avoid the possibility of the "race" condition when reduction happens over y. With this code, I get feeble speedup over 16 processors (about 2.0 times). What options do I have to perform such matrix-vector multiplication with the given COO sparse matrix which gives decent speedup? Can I avoid the atomic construct?

Thank you

Prapanch Nair
  • 185
  • 1
  • 10
  • 2
    You might be able to get better performance if you have a separate `y[]` array for each thread and then perform a reduction at the end. Since OpenMP array reductions are only supported in Fortran, you have to implement them on your own in C. – Hristo Iliev Mar 03 '15 at 06:51
  • Even better: do not use the coordinate format at all but the compressed column or row oriented one. – Algebraic Pavel Mar 11 '15 at 01:41

0 Answers0