I've got 64 bytes of continuous memory organized into big-endian DWORDs. I'd like to copy this data and and organize as little-endian elsewhere. I could do that DWORD after DWORD, e. g.:
mov eax, [rsi + rcx]
bswap eax ; setting proper endianness
mov [rdi + rcx], eax
But I feel there's a more efficient way, using xmm
/ymm
/zmm
registers and instructions such as vmovdqa
/vmovdqu
. The problem is, those only cover the copying part, without setting the endianness. So my question is - is there any smart way of organizing such a large chunk of data into DWORDs, or do I have to do it manually, as shown in the snippet? Or perhaps there's another efficient way that doesn't involve any of the above?
Edit: After @Erik Eidt's comment, I altered the question, because the original wording was confusing and didn't exactly describe the problem at hand properly.