The following is the code for matrix-vector multiplication of a sparse matrix available in COO format
for (int i=0; i<n; ++i)
y[i] = 0.0;
for (int i=0; i<nnz; ++i)
y[row[i]] += val[i]*x[col[i]];
for the sparse matrix given by row
, col
and val
arrays each of size nnz
, the number of non zero values in the matrix. When I parallelize the multiplication with openmp, I am doing the following
#pragma omp parallel for default(shared)
for (int i=0; i<nnz; ++i)
#pragma omp atomic
y[row[i]] += val[i]*x[col[i]];
I am using the atomic construct as I don't have any idea about the order in which row and column values are stored. Hence I should be avoid the possibility of the "race" condition when reduction happens over y
. With this code, I get feeble speedup over 16 processors (about 2.0 times). What options do I have to perform such matrix-vector multiplication with the given COO sparse matrix which gives decent speedup? Can I avoid the atomic construct?
Thank you