I am very new concerning the usage of inline assembly in C++ codes. What I want to do is basicly a kind of memcopy for pointer with a size modulo 32.
In C++ the code use to be something like this :
void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{
assert((sz%32 == 0));
for(const std::uint8_t* it = beg; it != (beg+sz);it+=32,out+=32)
{
__m256i = _mm256_stream_load_si256(reinterpret_cast<__m256i*>(it));
_mm256_stream_si256(reinterpret_cast<__m256i*>(out),tmp);
}
}
I already did a little bit of inline assembly, but each time I knew in advance both the size of the input tab, and the output tab.
So I tried this :
void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{
assert((sz%32 == 0));
__asm__ volatile(
"mov %1, %%eax \n"
"mov $0, %%ebx \n"
"L1: \n"
"vmovntdqa (%[src],%%ebx), %%ymm0 \n"
"vmovntdq %%ymm0, (%[dst],%%ebx) \n"
"add %%ebx, $32 \n"
"cmp %%eax, %%ebx \n"
"jz L1 \n"
:[dst]"=r"(out)
:[src]"r"(in),"m"(sz)
:"memory"
);
}
G++ told me :
Error: unsupported instruction `mov'
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: operand type mismatch for `add'
So I tried this :
void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{
assert((sz%32 == 0));
__asm__ volatile(
"mov %1, %%eax \n"
"mov $0, %%ebx \n"
"L1: \n"
"vmovntdqa %%ebx(%[src]), %%ymm0 \n"
"vmovntdq %%ymm0, (%[dst],%%ebx) \n"
"add %%ebx, $32 \n"
"cmp %%eax, %%ebx \n"
"jz L1 \n"
:[dst]"=r"(out)
:[src]"r"(in),"m"(sz)
:"memory"
);
}
I obtain from G++ :
Error: unsupported instruction `mov'
Error: junk `(%rdi)' after register
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: operand type mismatch for `add'
In every case I tried to find without succes a solution. I experience also this solution :
void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{
__asm__ volatile (
".intel_syntax noprefix;"
"mov eax, [SZ];"
"mov ebx, 0;"
"L1 : "
"vmovntdqa ymm0, [src+ebx];"
"vmovntdq [dst+ebx], ymm0;"
"add ebx, 32 \n"
"cmp ebx, eax \n"
"jz L1 \n"
".att_syntax;"
: [dst]"=r"(out)
: [SZ]"m"(sz),[src]"r"(in)
: "memory");
}
G++ :
undefined reference to `SZ'
undefined reference to `src'
undefined reference to `dst'
The message in that look like very common, but I have no idea how to fix it in that case.
I know also my tried do not strictly represent the code I wrote in C++.
I would like to understand what's wrong with my tried, and also how to translate as close as possible my C++ function.
Thank's in advance.