0

Suppose we are trying to remove the trailing zeroes from some unsigned variable.

uint64_t a = ...
uint64_t last_bit = a & -a; // Two's complement trick: last_bit holds the trailing bit of a
a /= last_bit; // Removing all trailing zeroes from a.

I noticed that it's faster to manually count the bits and shift. (MSVC compiler with optimizations on)

uint64_t a = ...
uint64_t last_bit = a & -a;
size_t last_bit_index = _BitScanForward64( last_bit );
a >>= last_bit_index

Are there any further quick tricks that would make this even faster, assuming that the compiler intrinsic _BitScanForward64 is faster than any of the alternatives?

John Gowers
  • 2,646
  • 2
  • 24
  • 37

2 Answers2

4

On x86, _tzcnt_u64 is a faster alterative of _BitScanForward64, if it is available (it is available with BMI instruction set).

Also, you can directly use that on the input, you don't need to isolate lowest bit set, as pointed out by @AlanBirtles in a comment.

Other than that, noting can be done for a single variable. For an array of them, there may be a SIMD solution.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
4

You can use std::countr_zero (c++20) and rely on the compiler to optimize it.

a >>= std::countr_zero(a);

(bonus: you don't need to specify the width and it works with any unsigned integer type)

apple apple
  • 10,292
  • 2
  • 16
  • 36