1

I have two arrays say 'fa' and 'tempxyz'. I need to subtract one from the other and store it in another array. I am using streaming stores. So I need to have aligned accesses. I aligned these two arrays and also the third array. I am still getting a seg. fault. For a streaming store, the arrays should be 64 byte aligned. Does this mean that every element of the array should be 64 bytes apart so that every element's address is a multiple of 64 ? I have given my code snippet below. Kindly help me out.

main()
{
 double *force = ( double * ) _mm_malloc ( (nd * np )* sizeof ( double ),64);  
                  // np can be any number (np=1000, 2000, etc.)
                  // nd = 3
 __declspec(align(64)) double array[np*nd];
 compute (force, array);
}

void compute (double *f double array[np*nd])
{
  __declspec(align(64)) double fa[8], tempxyz[8];

   for(k=0;k<np;k++)
   {   

   __assume_aligned(f,64);
   __assume((k*nd) % 8 == 0);

   for ( i = 0; i < nd; i++ )
   {
    f[i+k*nd] = 0.0;      
   }

   // Doing some computation on array and storing it in fa.

   fa[0] = array[k*nd+0];
   fa[1] = array[k*nd+1];
   fa[2] = array[k*nd+2];

   __m512d y1, y2, y3;

   __assume_aligned(&fa,64);
   __assume_aligned(&tempxyz,64);

   // Want to load 3 elements at a time, subtract all the three 
   // and store it at a memory location.

   y1 = _mm512_load_pd(fa);
   y2 = _mm512_load_pd(tempxyz);
   y3 = _mm512_sub_pd(y1,y2); 

   __assume_aligned(f,64);
   __assume((k*nd) % 8 == 0);    // Here nd=3 and k is loop index variable.    
   _mm512_storenr_pd((f+k*nd), y3);  // streaming store instruction 
                                     //   --- GIVING SEG. FAULT !!!

  } // end of k loop

}// end of compute function
Jagannath
  • 47
  • 7
  • Did You check return values for errors (`__mm_malloc` and others)? Also, where exactly segfault happens? – kestasx Jan 10 '15 at 10:40
  • I didn't check for the return values, but, only when the store instruction is added in the code, segfault is observed. Without the store instruction everything works fine. – Jagannath Jan 10 '15 at 10:45
  • Aren't You going out of allocated mamory in `_mm512_storenr_pd`? I'm not sure (can't test), but it seams to be `f+np*nd`. – kestasx Jan 10 '15 at 10:59
  • I have allocated the force array for a size of np*nd doubles which is the first line of main(). – Jagannath Jan 10 '15 at 11:31
  • Yes, I see allocation, that is because I suspect You are trying to access unallocated memory (starting at `f+np*nd`). If You increase allocated memory to `_mm_malloc ( (nd * np + 1) * ...` does it segfault at the same location? – kestasx Jan 10 '15 at 12:20
  • It gives segfault at the same location in spite of increasing the allocated memory. I had given a thought about this earlier and came to a conclusion that there is some problem with the memory alignment as a streaming store instruction gives segfault if the memory is not aligned. – Jagannath Jan 10 '15 at 13:13

1 Answers1

1

The array 'force' is 64-byte aligned. Thus every access to the force array should be 64-byte aligned i.e. the address of the element that is accessed should be a multiple of 64. At a time using the load_pd instruction, 8 doubles are loaded. (f + k * nd) accesses 3rd element when k=1 and 6th element when k=2 and so on. But the beginning of the 3rd element corresponds to 25th byte which is not a multiple of 64 and that is the reason why a segfault is occurring (similarly for other k values). So the formula (f + k * nd) itself should be changed so that every access using the formula is a multiple of 64.

Jagannath
  • 47
  • 7