I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access to.
I don't really know how a VM works. Is a binary entirely executed on a software-emulated virtual cpu ?
If not, supposing the VM is run on a cpu with SSE2, could the VM use his cpu SSE2 instruction when executing a SSE2 instruction from my binary ?
Could my vectorization be beneficial on the VM ?