This question is an extension of this one. Here I present two possible solutions and I want to known their feasibility. I am using a Haswell microarchitecture with GCC/ICC compilers. I also assume that memory is aligned.
OPTION 1 - I have a memory position already allocated and do 3 memory moves. (I use .memmove
instead of memcpy
to avoid the copy constructor)
void swap_memory(void *A, void* B, size_t TO_MOVE){
memmove(aux, B, TO_MOVE);
memmove(B, A, TO_MOVE);
memmove(A, aux, TO_MOVE);
}
OPTION 2 - Use AVX or AVX2 loads and stores, taking advantage of the aligned memory. To this solution I consider that I swap int
data types.
void swap_memory(int *A, int* B, int NUM_ELEMS){
int i, STOP_VEC = NUM_ELEMS - NUM_ELEMS%8;
__m256i data_A, data_B;
for (i=0; i<STOP_VEC; i+=8) {
data_A = _mm256_load_si256((__m256i*)&A[i]);
data_B = _mm256_load_si256((__m256i*)&B[i]);
_mm256_store_si256((__m256i*)&A[i], data_B);
_mm256_store_si256((__m256i*)&B[i], data_A);
}
for (; i<NUM_ELEMS; i++) {
std::swap(A[i], B[i]);
}
}
Is the option 2 the fastest? Is there another faster implementation that I din't mention?