4

I'm looking at this code:

http://lxr.free-electrons.com/source/arch/x86/include/asm/bitops.h

static inline unsigned long __ffs(unsigned long word) { asm("rep; bsf %1,%0" : "=r" (word) : "rm" (word)); return word; }

why is there a "rep" in front of bsf instruction ? And why this is not the case for __fls ?

w00d
  • 5,416
  • 12
  • 53
  • 85

1 Answers1

6

That's a hack to turn the bfs into tzcnt on processors that support it. It sure would have warranted a comment in the code, though. To quote the instruction set reference:

0F BC /r BSF r32, r/m32

F3 0F BC /r TZCNT r32, r/m32

TZCNT counts the number of trailing least significant zero bits in source operand (second operand) and returns the result in destination operand (first operand). TZCNT is an extension of the BSF instruction. The key difference between TZCNT and BSF instruction is that TZCNT provides operand size as output when source operand is zero while in the case of BSF instruction, if source operand is zero, the content of destination operand are undefined. On processors that do not support TZCNT, the instruction byte encoding is executed as BSF.

(The REP prefix is F3 of course.)

Jester
  • 56,577
  • 4
  • 81
  • 125
  • 1
    is there any performance benefit ? As in the comment, the return value is still undefined if we consider compiling for different platform. – w00d Oct 16 '15 at 16:58
  • 4
    Yes, for certain cpus there may be a performance benefit. According to Agner Fog's doc, most AMD cpus take 3 or 4 clocks for `BSF` but only 2 for `TZCNT`. On Intel Haswell, they take the same time. – Jester Oct 16 '15 at 17:13