2

I'm trying to add 4 numbers to other 4 numbers in assembly language with SSE2 instructions, using XMM registers. I did succeed, but I came over something I didn't understand. If I do the addition this way:

movdqu xmm0, oword [var1]
movdqu xmm1, oword [var2]
paddd xmm0, xmm1
movdqu oword [var1], xmm0 

It works perfectly fine.

But if I try it this way:

movdqu xmm0, oword [var1]
paddd xmm0, oword [var2]
movdqu oword [var1], xmm0 

It gives me a segmentation fault.

What is wrong with the second way of doing it? I'm using Nasm, Intel Atom N270, Linux Mint 12 32-bit

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Catalin Vasile
  • 367
  • 5
  • 17

1 Answers1

5

In the second example var2 needs to be 16 byte aligned, which I suspect is not the case.

In the first example you are using unaligned loads/stores so you don't see the problem there, but the paddd instruction in the second example requires a 16 byte aligned memory operand.

Paul R
  • 208,748
  • 37
  • 389
  • 560