I'm writing fairly low level code that must be highly optimized for speed. Every CPU cycle counts. Since the code is in Java I can't write as low level as in C for example, but I want to get everything out of the VM that I can.
I'm processing an array of bytes. There are two parts of my code that I'm primarily interested in at the moment. The first one is:
int key = (data[i] & 0xff)
| ((data[i + 1] & 0xff) << 8)
| ((data[i + 2] & 0xff) << 16)
| ((data[i + 3] & 0xff) << 24);
and the second one is:
key = (key << 15) | (key >>> 17);
Judging from the performance I'm guessing that these statements aren't optimized the way I expect. The second statement is basically a ROTL 15, key
. The first statement loads 4 bytes into an int. The 0xff
masks are there only to compensate for the added sign bits resulting from the implicit cast to int if the accessed byte happens to be negative. This should be easy to translate to efficient machine code, but to my surprise performance goes up if I remove the masks. (Which of course breaks my code, but I was interested to see what happens.)
What's going on here? Do the most common Java VMs optimize this code during JIT in the way one would expect a good C++ compiler to optimize the equivalent C++ code? Can I influence this process? Setting -XX:+AggressiveOpts
seems to make no difference.
(CPU: x64, Platform: Linux/HotSpot)