0

In my code, I have the following section (simplified)

  #pragma omp parallel for 
  for(i = 0; i < N; i++) {
    int x = struct_arr[i].x;
    double y = struct_arr[i].y;
    double z = struct_arr[i].z;
    double w = struct_arr[i].w;
    out[i].x = get_new_x(x,y,z,w);
  }

which, when parallelized, suffers drastic slowdowns. I suspected that there was an issue with false sharing, and using valgrind I found that there were a lot of cache misses in a given execution.

I have not provided details on what goes on in get_new_x, since I want to focus on one thing at a time; is it reasonable to guess that there is some false sharing going on in the part running up to the function call? Each thread would have their own local variables for x,y,z,w but they would all be reading from the same array. Could this be enough to cause cache misses? Similarily, I suspect that there might be cache conflict issue when writing from get_new_x to out[].

I guess all of these are possible causes of false sharing, but what are some ways of fixing it? Is any operation (reading vs writing) more or less likely to cause false sharing issues?

HereBeeBees
  • 145
  • 9
  • "False sharing" generally relates to concurrent *writes* - I don't think your reads have anything to do with it. You ought to be able to mitigate it by ensuring your threads are writing to different sections of your array. – Oliver Charlesworth Mar 04 '18 at 21:06
  • 1
    The first rule about performance analysis is **do not guess**. Consequently, please do not make us guess. Many more things are necessary for a well-founded discussion: The size of N, your actual performance results, your cache-measurement results, what `get_new_x` does, your compiler (options), your system specification, and a [mcve]. – Zulan Mar 05 '18 at 20:29

0 Answers0