0

I am very new concerning the usage of inline assembly in C++ codes. What I want to do is basicly a kind of memcopy for pointer with a size modulo 32.

In C++ the code use to be something like this :

void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{

       assert((sz%32 == 0));

    for(const std::uint8_t* it = beg; it != (beg+sz);it+=32,out+=32)
    {
      __m256i = _mm256_stream_load_si256(reinterpret_cast<__m256i*>(it));
      _mm256_stream_si256(reinterpret_cast<__m256i*>(out),tmp);

    }            
}

I already did a little bit of inline assembly, but each time I knew in advance both the size of the input tab, and the output tab.

So I tried this :

void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{

     assert((sz%32 == 0));

    __asm__ volatile(

                "mov %1, %%eax \n"
                "mov $0, %%ebx \n"

                "L1: \n"

                "vmovntdqa (%[src],%%ebx), %%ymm0 \n"
                "vmovntdq  %%ymm0, (%[dst],%%ebx) \n"

                "add %%ebx, $32 \n"

                "cmp %%eax, %%ebx \n"
                "jz L1 \n"

                :[dst]"=r"(out)
                :[src]"r"(in),"m"(sz)
                :"memory"
                );

}

G++ told me :

Error: unsupported instruction `mov'
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: operand type mismatch for `add'

So I tried this :

void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{

     assert((sz%32 == 0));
__asm__ volatile(

            "mov %1, %%eax \n"
            "mov $0, %%ebx \n"

            "L1: \n"

            "vmovntdqa %%ebx(%[src]), %%ymm0 \n"
            "vmovntdq  %%ymm0, (%[dst],%%ebx) \n"

            "add %%ebx, $32 \n"

            "cmp %%eax, %%ebx \n"
            "jz L1 \n"

            :[dst]"=r"(out)
            :[src]"r"(in),"m"(sz)
            :"memory"
                );

}

I obtain from G++ :

Error: unsupported instruction `mov'
Error: junk `(%rdi)' after register
Error: `(%rdi,%ebx)' is not a valid base/index expression
Error: operand type mismatch for `add'

In every case I tried to find without succes a solution. I experience also this solution :

void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t& sz)
{

    __asm__ volatile (
          ".intel_syntax noprefix;"

          "mov eax, [SZ];"
          "mov ebx, 0;"

          "L1 : "

          "vmovntdqa ymm0, [src+ebx];"
          "vmovntdq [dst+ebx], ymm0;"

          "add ebx, 32 \n"

          "cmp ebx, eax \n"
          "jz L1 \n"
                ".att_syntax;"
          : [dst]"=r"(out)
          : [SZ]"m"(sz),[src]"r"(in)
          : "memory");



}

G++ :

undefined reference to `SZ'
undefined reference to `src'
undefined reference to `dst'

The message in that look like very common, but I have no idea how to fix it in that case.

I know also my tried do not strictly represent the code I wrote in C++.

I would like to understand what's wrong with my tried, and also how to translate as close as possible my C++ function.

Thank's in advance.

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
John_Sharp1318
  • 939
  • 8
  • 19
  • 1
    C does not have `std::uint8_t`, nor `std::size_t`, nor anything else in the C++ standard library. So do not tag C, because your question has nothing to do with C. – Cory Kramer Jul 31 '15 at 18:30
  • 1
    You can't have both 64bit and 32bit registers in a memory address – harold Jul 31 '15 at 18:35
  • The error about `%rdi,%ebx` is clearly that you are mixing a 32-bit and a 64-bit address form - that is not valid. Change it to `%rbx` instead of `%ebx`, and it will work. – Mats Petersson Jul 31 '15 at 18:37
  • [Must learn to type more minimalist so I don't end up "slower" than for example harold above] – Mats Petersson Jul 31 '15 at 18:37
  • @CoryKramer ANSI C11 define a header stdint.h where int8_t, uint8_t, ... is are defined. Those types are reussed by ANSI C++ 11 under the namespace std in the header cstdint (http://www.cplusplus.com/reference/cstdint/). But I agree with you the concept of namespace do not exist in C language. @MatPetersson I apply the corrections but nothing change. I mean still have these errors : Error: unsupported instruction `mov' Error: junk `(%rdi)' after register Error: `(%ebx,%rdi)' is not a valid base/index expression Error: operand type mismatch for `add' – John_Sharp1318 Jul 31 '15 at 20:22
  • Since the error message still uses both ebx and rdi, it does not appear that your change has taken. Did you modify every place where you used ebx? Also, you MUST NOT modify registers in inline asm without letting the compiler know (like you are doing with eax and ebx). There are a number of ways to let the compiler know (like clobbers), but you must use one of them, or risk weird and difficult to find errors. – David Wohlferd Jul 31 '15 at 20:39

1 Answers1

2

Your first example is the most correct and has following errors:

  • It uses 32 bit registers instead of 64 bit.
  • 3 registers are changed which are not specified as outputs or clobbers.
  • EAX is loaded with source address, not the size.
  • dst is declared to be an output, when it should be an input.
  • The arguments for the add instruction are the wrong way round, in AT&T syntax the destination register is last.
  • A non-local label is used, which will fail if the asm statement gets duplicated, for example by inlining.

And the following performance issues:

  • The sz parameter is passed by reference. (May also impair optimisations in calling functions)
  • It is then passed into the asm as a memory argument, which requires it is written to memory.
  • Then it is copied to another register.
  • Fixed registers are used instead of letting the compiler choose.

Here is a fixed version, which is no faster than the equivalent C++ with intrinsics:

void my_memcpy(const std::uint8_t* in,std::uint8_t* out,const std::size_t sz)
{
     std::size_t count = 0;
     __m256i temp;

     assert((sz%32 == 0));

    __asm__ volatile(

                "1: \n"

                "vmovntdqa (%[src],%[count]), %[temp] \n"
                "vmovntdq  %[temp], (%[dst],%[count]) \n"

                "add $32, %[count] \n"

                "cmp %[sz], %[count] \n"
                "jz 1b \n"

                :[count]"+r"(count), [temp]"=x"(temp)
                :[dst]"r"(out), [src]"r"(in), [sz]"r"(sz)
                :"memory", "cc"
                );

}

The source and destination parameters are the other way round as memcpy which is potentially confusing.

Your Intel syntax version addition also fails to use the correct syntax to refer to arguments (eg %[dst]).

Timothy Baldwin
  • 3,551
  • 1
  • 14
  • 23