Segmentation fault when returning a pointer to an array of __m256d

Question

I was trying the Intel Intrinsic AVX2 datatype and functions.Unlike most codes found on the web which concentrate on looping on 256-bit segments of data on arrays,I tried to create an array of __m256d data. The code works for trying to load all array data to __m256 registers(in here array of these types). but when trying to retrieve or do other work with them I receive a segmentation fault 11 error.

I tried searching the web for any other answers but I couldn't find anything.

This is the load function:

__m256d* load_pd(double *a,int size)
{
    int length = size / 4;

    __m256d * out= new __m256d[length];
    int cnt=0;
    for(int i=0;i<size;i+=4)
    {
        out[cnt++] = _mm256_load_pd(&a[i]);
    }
    return out;
}

and I used the function in main:

    double a[256]={256};
    double b[256] = {256};
    __m256d* a_vec = load_pd(a,256);
    __m256d* b_vec = load_pd(b,256);

The next function however:

__m256d* add_pd(__m256d* a, __m256d* b,int size)
{
    __m256d* res = new __m256d[size];
    for(int i=0;i<size;i++)
    {
        res[i] = _mm256_add_pd(a[i],b[i]);
    }
    return res;
}

and when I invoke add_pd like this:

double c[256] = {256};
__m256d* res = _mm256_store_pd(a_vec,b_vec,256/4);

I get the Segmentation fault: 11 What is the problem here? Aren't other pointers allowed to point to already allocated pointer of __m256d?

Btw when you load "all data" like that, it doesn't go to registers. Temporarily yes, but there is nothing like an array of registers. So the entire approach has a problem. Even if you fix the crashes, it wouldn't do what you wanted it to do. — harold, Jul 16 '19 at 13:25
`++cnt` should be `cnt++`. You're running past the end of the array, off by one, and corrupting memory. You're welcome. — Sam Varshavchik, Jul 16 '19 at 13:29
Are you compiling with C++17 (e.g. `gcc -std=gnu++17`) to make `new` respect over-aligned types like `__m256d`? Until C++17, new still only returns 16-byte aligned memory even for types that require 32-byte alignment. (Yes this is really dumb, IDK why C++ took so long to fix it.) — Peter Cordes, Jul 16 '19 at 13:34
@SamVarshavchik Wheww!That was a huge mistake thanks,but still couldnt fix the seg fault. — Amirrad, Jul 16 '19 at 13:36
@harold I tried the other way around of loading chunks of 8 bytes,and as expected it worked. so basically the approach won't work since there aren't many types of registers? — Amirrad, Jul 16 '19 at 13:38
@PeterCordes this is interesting...I kind of was suspicious about the "new" behavior esp. seeing how much malloc or alligned_alloc was used in avx2 codes. I tried with gnu++17 but it didn't work either. — Amirrad, Jul 16 '19 at 13:40
IDK, maybe `new` still doesn't work and you do need `aligned_alloc` or similar. Where exactly is it segfaulting? Use your debugger. Or like harold said, abandon this whole function, it's useless. If you want a copy, just use `memcpy`. If not, `return (__m256d*)a;` Your code already requires `a` to be correctly aligned. Casting to vector-pointer is strict-aliasing safe because that's how they're defined: like `char*` a `__m256d *` can alias anything. — Peter Cordes, Jul 16 '19 at 13:53

Segmentation fault when returning a pointer to an array of __m256d

0 Answers0