0

I'm vectorizing a part of my program but it returns Segmentation fault error. What is wrong with this? Here it is the simplified section, that cause the problem. j++ and i++ is exactly what I want, I do not want to be j += 16.

unsigned short int input[256][256] __attribute__((aligned(32)));//global

for (i = 0; i < 256 - 16; i++) {    
    for (j = 0; j < 256 - 16; j++) {
        temp_v2  =_mm256_load_si256((__m256i *)&input[i][j]);
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
ADMS
  • 117
  • 3
  • 18
  • Don't use proprietary extension if a standard feature is available. C provides the `_Alignas` specifier. – too honest for this site Mar 20 '16 at 22:35
  • Your code violates strict aliasing – M.M Mar 20 '16 at 22:36
  • In gcc I use`__attribute__(( aligned(X)))` , haven't seen problem like this before. what can I do for strict aliasing? – ADMS Mar 20 '16 at 22:50
  • Don't worry about strict aliasing in this context - you can use casts with intrinsics like this. Also the gcc alignment directives are fine - people are just nit-picking. – Paul R Mar 20 '16 at 22:52

1 Answers1

2

If you really do want overlapping loads where you just increment the inner loop by 1 (as you seem to be suggesting in the question) then you need to use unaligned load instructions:

for (i = 0; i < 256; i++) {  
        for (j = 0; j + 16 <= 256; j++) {
           temp_v2 = _mm256_loadu_si256((__m256i *)&input[i][j]);
    }                       ^^^^^
} 

but this would be a pretty weird and inefficient thing to do.


Normally you would just do something like this to iterate through the whole array:
for (i = 0; i < 256; i++) {  
        for (j = 0; j < 256; j += 16) {
           temp_v2 = _mm256_load_si256((__m256i *)&input[i][j]);
    }
} 
Paul R
  • 208,748
  • 37
  • 389
  • 560