Why this code section return "Segmentation fault" error?

Question

I'm vectorizing a part of my program but it returns Segmentation fault error. What is wrong with this? Here it is the simplified section, that cause the problem. j++ and i++ is exactly what I want, I do not want to be j += 16.

unsigned short int input[256][256] __attribute__((aligned(32)));//global

for (i = 0; i < 256 - 16; i++) {    
    for (j = 0; j < 256 - 16; j++) {
        temp_v2  =_mm256_load_si256((__m256i *)&input[i][j]);
    }
}

Don't use proprietary extension if a standard feature is available. C provides the `_Alignas` specifier. — too honest for this site, Mar 20 '16 at 22:35
In gcc I use`__attribute__(( aligned(X)))` , haven't seen problem like this before. what can I do for strict aliasing? — ADMS, Mar 20 '16 at 22:50
Don't worry about strict aliasing in this context - you can use casts with intrinsics like this. Also the gcc alignment directives are fine - people are just nit-picking. — Paul R, Mar 20 '16 at 22:52

Paul R · Accepted Answer · 2016-03-21T07:04:54.843

If you really do want overlapping loads where you just increment the inner loop by 1 (as you seem to be suggesting in the question) then you need to use unaligned load instructions:

for (i = 0; i < 256; i++) {  
        for (j = 0; j + 16 <= 256; j++) {
           temp_v2 = _mm256_loadu_si256((__m256i *)&input[i][j]);
    }                       ^^^^^
}

but this would be a pretty weird and inefficient thing to do.

Normally you would just do something like this to iterate through the whole array:

for (i = 0; i < 256; i++) {  
        for (j = 0; j < 256; j += 16) {
           temp_v2 = _mm256_load_si256((__m256i *)&input[i][j]);
    }
}

Why this code section return "Segmentation fault" error?

1 Answers1