2

I want to get the value of the carry flag after performing a bit-shift.

Rust has stable functions that check for overflow like so:

pub fn add_with_carry(foo : &mut u8, bar : u8, cf : &mut u8) {
    let(result, overflow) = foo.overflowing_add(bar);
    *foo = result;
    *cf = overflow as u8;
}

which generates assembly like this:

example::add_with_carry:
  add byte ptr [rdi], sil
  setb byte ptr [rdx]
  ret

however overflowing_shr only returns true when the value is shifted by its size or more, I only care about the least significant bit, which would be stored in CF, currently the best i've come up with is:

pub fn shr(foo : &mut u8, cf : &mut u8) {
    *cf = *foo & 1;
    *foo >>=1
}

however that is somewhat less efficient:

example::shr:
  mov al, byte ptr [rdi]
  and al, 1
  mov byte ptr [rsi], al
  shr byte ptr [rdi]
  ret

I would rather have something like:

example::shr:
  shr byte ptr [rdi]
  setc byte ptr [rdx]
  ret

There doesn't seem to be an intrinsic to get CF and inline assembly is not stable at the moment.

BreadFish64
  • 33
  • 1
  • 3
  • 1
    I guess alias analysis can't prove that `cf` and `foo` aren't both pointing to the same byte. If you save `*foo & 1` in a local var and assign to `*cf` 2nd, that would allow the compiler to load, shift, store the shift result, then store the AND or SETC result. *Then* you'd have a real [MCVE] for a missed optimization that you could report as an LLVM bug. Whether or not the language needs features to expose the CF result of right-shifting is not clear; not all ISAs have flags at all (e.g. MIPS) so you don't get it for free on all CPUs. But that could maybe help compilers some. – Peter Cordes May 02 '19 at 04:08
  • Using `shr [mem]`/`setc [mem]` would be a slight optimization, but memory-dest `shr` is a 3-uop instruction on Intel (e.g. Skylake), and memory destination `setcc` is 2 uops, so that's a total of 5 uops, the same as you can do with `movzx eax, byte [rdi]` / `rorx edx, eax, 1` to copy-and-shift / `and al,1` / 2x stores. But without BMI2 rorx (or if the compiler fails to use it for a right shift when the upper bits will be ignored) your way is 1 uop better, and definitely smaller code size either way. – Peter Cordes May 02 '19 at 04:12
  • I don't think you can have better assembly that you already have. – Stargateur May 02 '19 at 04:52

0 Answers0