I am trying to implement the Hogwild! Linear SVM algorithm, but I am running into false sharing problems with my implementation.
My code is below, but the background is that I am trying to compute which samples fail my test and make and update which is given by that set of vectors. Hogwild! (as far as I understand) simply makes the update on the same memory totally asynchronously. This would create "noise" in a mathematical sense due to the improperly times updates.
Sadly, as I try to do these async updates, the L1 cache is invalidated and has to be re-fetched. Below is my code.
Is there a good way to fix this false sharing without losing the asynchronous? (I am more of a mathematician than a computer scientist). This mentions that using different optimization levels can fix this.
void update(size_t epoch, const double *X_data, const int *X_indices,
const int *X_indptr, const int *Y, double *W,
double reg, double step_size, size_t nodes,
size_t X_height, size_t X_width) {
size_t i, j;
double step = step_size/(1 + epoch);
double c;
#pragma omp parallel shared(W, X_data, X_indices, X_indptr, Y) private(i, j, c)
{
#pragma for schedule(static)
for (i=0;i<X_height;i++) {
c = 0.0;
for (j=X_indptr[i];j<X_indptr[i+1];j++)
c += X_data[j]*W[X_indices[j]]; // Scaled to discount the MPI scaling
if (Y[i]*c > 1)
continue;
for (j=X_indptr[i];j<X_indptr[i+1];j++)
W[X_indices[j]] += step*Y[i]*X_data[j]/(X_height*nodes);
} // END FOR OMP PARALLELIZED
#pragma for schedule(static) // Might not do much
for (i=0;i<X_width;i++) // (1 - self.reg*step)*self.W/self.nodes +
W[i] *= (1 - reg*step)/nodes;
}
}