Does shr(7,dest) take more time than shr(1,dest)?

Question

I am in the process learning HLA Assembly from the book, Art of Assembly Language, 2nd Edition. I just started learning about the shr and shl instructions and i would like to know if shifting by a larger amount would take more time than shifting by a smaller amount. shr(1,dest) vs shr(7,dest).

I'm sorry if the syntax for the instructions are wrong.

It depends on the processor. You need to read the data sheet for the performance characteristics for the processor you are targeting. Modern implementations use a barrel shifter which doesn't care how much you are shifting by. — Raymond Chen, Sep 06 '15 at 22:43

Peter Cordes · Accepted Answer · 2015-09-07T06:35:34.570

http://agner.org/optimize/ has instruction timings for x86 CPUs, and microarch guides.

Shift and rotate with an immediate (compile-time-constant) count are single cycle latency on recent AMD and Intel.

Rotate-through-carry by any count other than 1 is slow, but probably constant-time. (data-dependent timing makes out-of-order execution dependency tracking even trickier, so I think they just take the maximum.

Another strange thing: apparently IvyBridge / Haswell take an extra uop for the short-form ROL / ROR rotate-by-1 opcode, so throughput is half compared to the normal opcode with an imm8 count of 1.

re: HLA: C and C++ compilers have good support for intrinsics now (functions that turn into inline instructions). There's not as much of a use-case for HLA anymore, I think I remember reading. According to some source I can't remember (sorry >.<), these days you might as well just learn normal asm. A lot of the time, you can get speedups from using vector instructions (or bit-manipulation, like popcount) through intrinsics in C/C++.

If you're having fun learning HLA, and think it's useful, then best of luck to you, though.

Does shr(7,dest) take more time than shr(1,dest)?

1 Answers1