3

Is it possible to capture the carry flag from an x86 SHL/SHR/SAR instruction in C, without inline assembly?

I couldn't find any intrinsic for shifts, and it doesn't seem like compilers pick up patterns where it would be used.

Suppose I want the following:

shr eax, cl   ; eax >>= cl
adc ebx, edx  ; ebx += edx + shifted_out_bit

What C code could I write that would likely get a compiler to generate that sort of output?

My failed attempt

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
zinga
  • 769
  • 7
  • 17
  • 2
    Some compilers might be avoiding it on purpose because reading the flag result from a shift causes a stall on P6-family CPUs. (Except for the shift-by-implicit-1 uop that knows it never needs to leave FLAGS unmodified.) [INC instruction vs ADD 1: Does it matter?](//stackoverflow.com/q/36510095). I think I saw discussion about this on an LLVM bug report or code review a while ago... Compilers tend to avoid asm sequences that are terrible on one uarch even if they're slightly better on another. (I think this was about using the ZF result, where it just costs an extra TEST to avoid) – Peter Cordes Sep 26 '19 at 01:01
  • 3
    In theory `-march=skylake` should tell it not to care about P6-family. But yeah, `-march=skylake -mno-bmi2` still makes the same asm with gcc and clang https://godbolt.org/z/JnMHkw. This is a missed optimization; you can report it at https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc and https://bugs.llvm.org. (I'd suggest a [mcve] of that C source, and a link to this SO question, in your missed-optimization bug reports.) – Peter Cordes Sep 26 '19 at 01:04
  • Thanks for the info @PeterCordes. I haven't read your link fully, but it does look like the stall from reading shift flags is surprisingly large! Anyway, I was originally looking for something like an intrinsic, like `_addcarry_u32` but for shifts, to avoid the unpredictability of compiler optimizations, but perhaps there's no such solution... – zinga Sep 26 '19 at 14:00

0 Answers0