1

I am trying to optimize my openMP code for Conway's Game of life. My main problem is related, probably, to false sharing. Here's my code: This is a method in my class "mondo". m is my scheme, r the number of rows, c number of columns and I am using a toroidal scheme.

void evolution() {
    mondo tmp(r, c);
    #pragma omp parallel for schedule(static)
    for(int i=0; i<r; i++) {  
        unsigned a=(i+1)%r, b=(i-1+r)%r;        
        for(int j=0; j<c; j++)  {                                 
            unsigned d=(j+1)%c, e=(j-1+c)%c;     
            int v=(m[a][j]+m[b][j]+m[a][d]+m[b][d]+m[a][e]+m[b][e]+m[i][d]+m[i][e]);             
            if(m[i][j]==1) {              
                if(v<2 || v>3)
                   tmp.m[i][j]=0;
                else  
                    tmp.m[i][j]=1;              
            } 
            else {
                if(v==3)
                   tmp.m[i][j]=1;
                else
                   tmp.m[i][j]=0;
            }                          
        }
    }    
    (*this)=tmp;
}
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • You might want to revise your indents to make the code more readable. – crashmstr Feb 17 '14 at 15:08
  • Done. Hope it's better now – Nicola Ferro Feb 17 '14 at 15:24
  • Can you give more details on what the problem? For example, is it slower than without OpenMP? – Z boson Feb 17 '14 at 15:42
  • What is the size of r and c? – Z boson Feb 17 '14 at 15:43
  • r anc c are about 4000. The program is slower than without OpenMP, but the speedup is not satisfying. – Nicola Ferro Feb 17 '14 at 15:50
  • It's not clear to me why false sharing would be a problem. Each thread is writing to a different row and since the rows are about 4000 wide there won't be much overlap. How are you timing your code? What's the compiler? Do you have optimization on (i.e. -O3 with GCC or /O2 with MSVC). – Z boson Feb 17 '14 at 15:55
  • What is `tmp.m`? Is it an array of ints? – Z boson Feb 17 '14 at 15:57
  • tmp.m is the attribute of tmp and it is **bool (a matrix). I'm timing my code with a stopwatch inside the code and I'm using -O2 optimization with g++. Thanks for your time, by the way. – Nicola Ferro Feb 17 '14 at 16:02
  • @NicolaFerro, can you tell me how the stopwatch is defined. What function are using? – Z boson Feb 17 '14 at 18:26
  • It's a header file they gave me in class. For the valutation they will use an external watch. By the way, this stopwatch uses chrono library (#include ) – Nicola Ferro Feb 18 '14 at 08:30
  • @NicolaFerro, can you try `omp_get_wtime()` for the timing instead? Why do you think false sharing is causing a problem? Bools are usually 8-bits and a cache-line can store 64 8-bit values so if c was small (a few multiples of 64) false sharing would probably be significant but for c=4000 I don't think it is. – Z boson Feb 18 '14 at 09:30

0 Answers0