1

For fast XORing two regions of memory, I wrote a function(region_xor_avx()) with AVX instructions optimized. However, the program met a core dump error at _mm256_xor_si256(). Here is a short self-contained example:

#include <stdlib.h>
#include <stdio.h>
#include <immintrin.h>

int region_xor_avx(void *dst, void *src, int len){
    int k;
    int len256 = len/32;

    __m256i *_buf1 = (__m256i *)src;
    __m256i *_buf2 = (__m256i *)dst;

    for(k = 0; k < len256; ++k){
        _buf2[k] = _mm256_xor_si256(_buf1[k], _buf2[k]);
    }

    return 1;
}

int main(){
    int i;
    int arr1[8] = {1, 2, 3, 4, 5, 6, 7, 8};
    int arr2[8] = {0, 1, 2, 3, 4, 5, 6, 7};
    int arr3[8] = {0, 1, 2, 3, 4, 5, 6, 7};
    int *psrc;
    int *pdes1, *pdes2;

    psrc = arr1;
    pdes1 = arr2;
    pdes2 = arr3;

    for(i = 0; i < 8; ++i){
        pdes1[i] = pdes1[i]^psrc[i];
    }
    region_xor_avx(pdes2, psrc, 8*sizeof(int));

    if(memcmp(pdes1, pdes2, 8*sizeof(int)) == 0){
        printf("equal!\n");
    }else{
        printf("Not equal!\n");
    }

    return 1;
}

My CPU is Intel(R) Core(TM) i7-4770K supporting AVX instructions. My compiler is gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1 and the compiler option is -g -mavx2

foool
  • 1,462
  • 1
  • 15
  • 29
  • I see nothing obviously wrong with your code. Are you sure your processor supports AVX2 instructions (which GCC is free to use with ``-mavx2``)? It's only available on Haswell processors. You can check for it in ``/proc/cpuinfo``. – Christian Aichinger Apr 15 '14 at 03:02
  • Thank you for your remind, the CPU is a Haswell processor : Intel(R) Core(TM) i7-4770K – foool Apr 15 '14 at 03:12

1 Answers1

1

You need to make sure your memory is correctly aligned for AVX. AVX alignment is 32 bit. So this is the function with the alignment assertion:

int region_xor_avx(void *dst, void *src, int len){
{
    const int align = 32;

    // Ensure src is aligned
    size_t src_unaligned_part = 
        ((((long)src) + align - 1) / align * align) - (long)src;
    assert(src_unaligned_part == 0); // !!! If memory aligned correctly !!!

    // same for dst

    //...
}

Aligned memory can be allocated and freed as follows:

#include <intrin.h>
void *ptr = _mm_malloc(size, align);
_mm_free(ptr);
Anton K
  • 4,658
  • 2
  • 47
  • 60