1

First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community:

Here, I am sharing links that describe these two instructions

Now let's dive into my problem. I am using Linux(64-bit). I have created a test project in C++ that uses several assembly implementations. I need better insight using MOVDQA and MOVDQU that loads data to xmm registers. Here I am sharing some of my experiments:


    // Initialization 1
    std::string_view lhs{"Once upon a time in Germany"}; // length = 27
    std::string_view rhs = lhs.substr(20, 7);            // RHS points to "germany"
    # Experiment 1.1
    # Here: %rax = lhs, %rsi = rhs
    movdqa (%rax), %xmm11           // SIGSEGV in this line, although it has enough memory allocations
    movdqa (%rsi), %xmm12
    # Experiment 1.2
    # Here: %rax = lhs, %rsi = rhs
    movdqu (%rax), %xmm11           // data successfully loaded into register
    movdqu (%rsi), %xmm12           // data successfully loaded into register with some garbage

    // Initialization 2
    void *lhs = mmap ( NULL, 27*sizeof(unsigned char), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0 );
    void *rhs = lhs + 20;
    # Experiment 2.1
    # Here: %rax = lhs, %rsi = rhs
    movdqa (%rax), %xmm11           // data successfully loaded into register
    movdqa (%rsi), %xmm12           // SIGSEGV in this line... Expected!
    # Experiment 2.2
    # Here: %rax = lhs, %rsi = rhs
    movdqu (%rax), %xmm11           // data successfully loaded into register
    movdqu (%rsi), %xmm12           // data successfully loaded into register with some garbage

    // Initialization 3
    unsigned char *lhs = new unsigned char[27];
    unsigned char *rhs = lhs + 20;
    # Experiment 3.1
    # Here: %rax = lhs, %rsi = rhs
    movdqa (%rax), %xmm11           // data successfully loaded into register
    movdqa (%rsi), %xmm12           // SIGSEGV in this line
    # Experiment 3.2
    # Here: %rax = lhs, %rsi = rhs
    movdqu (%rax), %xmm11           // data successfully loaded into register
    movdqu (%rsi), %xmm12           // data successfully loaded into register with some garbage

Questions:

  • Can anyone explain experiment 1.1?
  • It seems to me, using MOVDQU is always safe. If it's not, What are the scenarios MOVDQU throws SIGSSEGV?
  • Also, What are the scenarios MOVDQA throws SIGSSEGV?
  • Can anyone share any reference about showing the Performance using MOVDQA and MOVDQU (i.e which is faster)?
RajibTheKing
  • 1,234
  • 1
  • 15
  • 35
  • 4
    `movdqa` requites the data to be **16-byte aligned** in memory, that's what the A vs. U stands for. Address % 16 ==0. i.e. the low 4 bits (low hex digit) must be zero. If you violate this, it raises a #GP hardware exception like it says in the manual (https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) so the Linux kernel delivers a SIGSEGV. They're equal speed if the memory is aligned on modern CPUs (Nehalem and later.) – Peter Cordes Nov 08 '22 at 12:34
  • Would you please look into experiment 1.1? I have initialized string_view with 27 byte. My program crashed accessing first 16 byte information by MOVDQA. Any explanation? – RajibTheKing Nov 08 '22 at 12:48
  • 1
    @RajibTheKing: Have you tried printing out the hex memory address that `rax` is pointing to? There's no guarantee that the literal is 16-byte aligned. – ShadowRanger Nov 08 '22 at 12:50
  • "`MOVDQU` throws `SIGSSEGV`" - you're confusing two concepts here. The CPU doesn't throw OS-level exceptions. For instance, Windows on x64 can also execute `MOVDQU` but it doesn't even have a `SIGSSEGV`. – MSalters Nov 08 '22 at 12:51
  • 1.1 - `alignof(std::string_view)` is very likely less than 16. It's probably a struct of a couple 8-byte pointers. Or if you're actually dereferencing it to get a `char*` to the pointed-to string literal, no reason to expect it to be 16-byte aligned either. Use a debugger to look at the address in RAX when it faults. – Peter Cordes Nov 08 '22 at 12:51
  • 1
    As for when MOVDQU can fault (which you mentioned in your last edit as a reopen reason, despite that being mostly a footnote to the main question about movdqa requiring alignment): no special cases different from an integer `mov`, just if accessing an unmapped page, or trying to store to a read-only page. The asm manual documents possible exception reasons for each instruction. [Is it safe to read past the end of a buffer within the same page on x86 and x64?](//stackoverflow.com/q/37800739) discusses why you can't safely use unaligned loads like `movdqu` in a naive way to implement `strlen`. – Peter Cordes Nov 08 '22 at 12:53
  • @PeterCordes, Thanks a lot for your time. I read through the link you provided. Still, it's vague to me to figure out exact scenario that will be responsible for SIGSEGV while using ```movdqu``` . But, It would be great if you can give me an example so that I can generate it on my own program to clarify my understandings. – RajibTheKing Nov 08 '22 at 15:04
  • @PeterCordes Would you please elaborate a bit more "if accessing an unmapped page, or trying to store to a read-only page" ? Or share any good link to read... where I can enlighten myself. – RajibTheKing Nov 08 '22 at 15:11
  • [What causes a SIGSEGV](https://stackoverflow.com/q/1564372) - any of those things, done with `movdqu` or `mov` or any other instruction that reads/writes memory, will cause a #PF page fault that the OS determines is invalid, and thus delivers a SIGSEGV. Another case that can happen with SIMD but wouldn't accessing one element at a time is if you had a short string, say 3 bytes, and its terminator is the last byte of a page. The next page is unmapped. A 16-byte load starting from there would segfault, just like a 4-byte load, since it would try to load the first byte of an unmapped page. – Peter Cordes Nov 08 '22 at 15:23
  • See also [segmentation fault vs page fault](https://stackoverflow.com/q/6950549) – Peter Cordes Nov 08 '22 at 15:24
  • `movdqu` will segfault in exactly the same cases when 16 separate `movb` loads or stores would have segfaulted. – Peter Cordes Nov 08 '22 at 15:24

0 Answers0