3

Long time MSVC user, new to gcc (so bear with me).

I am using the rubenvb version of c++ (see version in subject, yes I'm building for 64bit) on Windows 7 and I'm having a problem using _BitScanForward64. Some sample code looks like this:

int __cdecl main(int argc, char* argv[])
{
    DWORD d = (DWORD)atoi(argv[1]);

    DWORD ix, ix2;
    ix2 = _BitScanForward64(&ix, d);
    printf("bsf %u %u\n", ix, ix2);
}

I am compiling with:

"C:\Program Files\gcc2\mingw64\bin\c++.exe" -o iTot.exe -mno-ms-bitfields -march=native -momit-leaf-frame-pointer -mwin32 -Os -fomit-frame-pointer -m64 -msse4 -mpopcnt -D WINDOWS main.cpp

When I run iTot.exe using the parameter 8, I expected that _BitScanForward64 would set ix to 3. That's what MSVC does. However, ix is 0 and ix2 is 1.

Also, looking at the assembler, I see:

bsfq QWORD PTR 44[rsp],rax   # MEM[(volatile LONG64 *)&ix], Mask

Under the circumstances, why does gcc force a memory write+read here?

So, a few questions:

  1. Is _BitScanForward64 somehow supposed to be called differently under gcc? If I'm just calling it wrong, that would be good to know (although the incompatibility with MSVC would be a pain).
  2. Why does the _BitScanForward64 intrinsic force a memory write?
  3. Staring at the assembler output from -S, I couldn't see anything wrong with the code being generated. However, using objdump.exe -d -Mintel, I see that rather than using the asm code above (which seems like it would work), it actually produced the reverse:

    bsf rax,QWORD PTR [rsp+0x2c]

WTF? Why is -S lying to me?

Like I said, I'm new to gcc, so if I'm just doing something dumb, be gentle with me. Thanks.

David Wohlferd
  • 7,110
  • 2
  • 29
  • 56
  • 1
    0x2c = 44, isn't it? Also, `-S` produces output in AT&T syntax by default and the output from objdump produced the more typical syntax for x86 CPUs, intel. The two differ in the order of operands. – Alexey Frunze Mar 20 '13 at 07:08
  • 1
    Do you include the correct header that declares the `_BitScanForward64` function? Otherwise the compiler will not know what the function is, what arguments it takes and what it returns, and will most likely generate code that won't work properly. – Some programmer dude Mar 20 '13 at 07:10
  • @alexey: I specifically configured objdump to output intel since that's what I read. I also configured c++ -S to output intel. Since they are both outputting intel, I expected the results to be the same. When I open the .exe file in a debugger (VS), I see the same (wrong) code, which undoubtedly explains why printf shows that BitScanForward is returning the wrong answer. – David Wohlferd Mar 20 '13 at 07:35
  • @Joachim: If I didn't have the right headers, I wouldn't expect the code to compile, let alone produce executable output that contains (nearly) the right asm code. FWIW, I am including intrin.h, which is (I believe) where this is defined. – David Wohlferd Mar 20 '13 at 07:38
  • It makes no sense. `bsf` takes two operands, of which one and only one can be a memory operand. The memory operand cannot somehow magically appear in the wrong place in compiled code simply because there's no `bsf` instruction with swapped operands. Look at the bytes comprising the instruction, they must be identical, whether you seem them interpreted as `bsfq QWORD PTR 44[rsp],rax` or as `bsf rax,QWORD PTR [rsp+0x2c]`. – Alexey Frunze Mar 20 '13 at 07:42
  • How about single-stepping the code in the debugger, that is, noting the state of the registers and of your variables immediately before this instruction and immediately after? Can you spot a problem there? – Alexey Frunze Mar 20 '13 at 07:45
  • Ok, I think I'm making progress here. While I do include intrin.h (as I said), that's not where the compiler is finding the definition. Looking at the output again, I see something important: /APP # 1252 "blah blah blah/include/winnt.h" 1 bsfq QWORD PTR 44[rsp],rax # 0 "" 2 /NO_APP Looking at winnt.h I see: __CRT_INLINE BOOLEAN _BitScanReverse64(DWORD *Index,DWORD64 Mask) { __asm__ __volatile__("bsrq %1,%0" : "=r" (Mask),"=m" ((*(volatile LONG64 *)Index))); return Mask!=0; } I don't read gcc's inline assembler well. But could this be backward? – David Wohlferd Mar 20 '13 at 08:03
  • Ok, I've got the solution, but the forum won't let me post it as an answer until later. In case I get hit by a bus between now and then, the problem is that the definition for BitScanForward64 in winnt.h is just plain wrong. And the output from -S really is backwards. Details soon. – David Wohlferd Mar 20 '13 at 09:12

1 Answers1

1

Ok, I think I've answered my own questions. Thanks to Joachim PileBorg who made me look at where the definition was, and Alexey Frunze who pointed out that the params can't be backward.

While I'm too new to gcc to say this authoritatively, I believe the definition for _BitScanForward64 in winnt.h is very wrong.

The current definition:

__CRT_INLINE BOOLEAN _BitScanForward64(DWORD *Index,DWORD64 Mask) {
  __asm__ __volatile__("bsfq %1,%0" : "=r" (Mask),"=m" ((*(volatile LONG64 *)Index)));
  return Mask!=0;
}

My definition:

__CRT_INLINE BOOLEAN BSF(DWORD *Index,DWORD64 Mask) {
  LONG64 t;
  __asm__ ("bsfq %0,%1" : "=r" (Mask),"=r" (t));
  *Index = t;
  return Mask!=0;
}

Note the removal of the (unneeded) volatile, the reversal of the parameters to bsfq, the change from =m to =r, etc. Basically, it appears this definition is as wrong as it could be and still compile.

I'm guessing the person who wrote this looked at the prototype for BitScanForward64 and "knew" that one of the parameters had to be memory, and since the only one that can be memory for BSF is p2, that's what they did. As written, the code will read the unwritten contents of p2 and scan it for bits. It compiles, but produces the wrong answer.

So, to take my questions in order:

  1. No, I wasn't calling it incorrectly. The definition in winnt.h is just wrong. In fact, there's probably a bunch in that file that have a similar problem (_BitScanForward, _BitScanForward64, _BitScanReverse, _BitScanReverse64, etc).
  2. It forces a memory write because the code in winnt.h was wrong. My proposed change does not force any memory accesses.
  3. -S is writing the output file incorrectly (objdump has it right). Using my definition above produces:

    call    atoi
    lea rcx, .LC0[rip]
    /APP
    # 7 "m.cpp" 1
    bsfq rax,rdx
    /NO_APP
    call    printf
    

And this isn't what is actually in the executable file. The actual executable file contains the (correct) definition:

bsfq rdx,rax

While I'm not excited about modifying system header files, it appears that's going to be my answer here. If anyone knows how/where to report this problem so it gets fixed (as I mentioned, I'm using reubenvb), I could report these 2 issues so (hopefully) this gets fixed for everyone.

David Wohlferd
  • 7,110
  • 2
  • 29
  • 56