I am writing a program that writes to a device's range of HW registers. I am using mmap to map the HW addresses to virtual address (user space). I tested the result from the mmap and it is OK. I implemented a copy of a buffer into the device:
void bufferCopy(void *dest, void *src, const size_t size) {
uint8_t *pdest = static_cast<uint8_t *>(dest);
uint8_t *psrc = static_cast<uint8_t *>(src);
size_t iters = 0, tailBytes = 0;
/* iterate 64bit */
iters = (size / sizeof(uint64_t));
for (size_t index = 0; index < iters; ++index) {
*(reinterpret_cast<uint64_t *>(pdest)) =
*(reinterpret_cast<uint64_t *>(psrc));
pdest += sizeof(uint64_t);
psrc += sizeof(uint64_t);
}
.
.
.
but when running it on QEMU I get illegal instruction exception. When I debugged got it crashes on the next instruction (below is the asm of the main loop):
movdqu (%rsi,%rax,1),%xmm0
movups %xmm0,(%rdi,%rax,1) <----- this instruction crashes ...
add $0x10,%rax
cmp %rax,%r9
jne 0x7ffff7eca1e0 <_ZN12_GLOBAL__N_110bufferCopyEPvS0_m+64>
any ideas why ? my guess that you can write to PCI only 32/64 bit. The compile doesn’t know my limitations, so it optimize my code and create vectorized loop (each iteration loads 128 bit and saves 128 bit). Is is making sense ?? can I write to PCI with vectorized instructions ?
Also, whether it is a missing feature in QEMU or a bug or just a recommendation, how can I prevent from the compiler to generate those vector instructions ?