3

I've been playing around with x87 FPU programming,and I just came across the following magic spell for converting from float (32-bit) to int (32-bit):

    flds    from             # number to convert from
    fadds   magic1           # add magic number 1 (???)
    fstps   to               # store in destination as single-precision binary32
    movl    to,%eax          # load float result as an int
    subl    $magic2,%eax     # subtract magic number 2 (???)
    sarl    $1,%eax          # divide by 2
    movl    %eax,to          # and look, here's the result!

 .section .rodata
magic1:  .float 6291456.0    # 3 * 2^21
.equ magic2, 0x4ac00000      # ???

   .data
from:    .float 31415.9265   # pick a number, any number...
to:      .long  0            # result ends up here

(AT&T syntax with GAS directives)

I have tried this out, and it seems to work (rounding towards -infinity) but I have absolutely no idea why! Can anyone explain how it works?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1636349
  • 458
  • 1
  • 4
  • 21
  • Fascinating! Quite an ingenious design. Will write an answer shortly. – fuz Feb 25 '22 at 13:21
  • Fast because it does truncation (for non-negative numbers) without having to change the rounding mode? Just for the record, [SSE3 `fisttp`](https://www.felixcloutier.com/x86/FISTTP.html) makes that obsolete, as does using SSE for scalar math ([`cvttsd2si`](https://www.felixcloutier.com/x86/CVTSD2SI.html)). But still interesting if you actually want rounding towards -Inf specifically, instead of truncation or the default rounding mode (`fist` or `cvtsd2si`). SSE4 provides `roundss` / `roundsd` which take a rounding mode as an immediate, so you'd use that before cvt. – Peter Cordes Feb 25 '22 at 13:28
  • `subl $magic2,%eax` is almost certainly wrong; that subtracts the address, not the `0x4ac00000` value there. You should do `.equ magic2, 0x4ac00000` to use that value as the immeidate. Also, the destination is a `.quad` but it looks like we only ever store / reload a dword there. But that's just wasted space, not correctness bugs. (Same for `magic2`, although as I said it shouldn't be in data memory at all.) – Peter Cordes Feb 25 '22 at 13:34
  • The SSE2 equivalent would be `addss xmm0, [magic1]` / `psubd xmm0, [magic2]` (16-byte load so align this) / `psrad xmm0, 1`. If you have the constants loaded, that's probably faster than `roundss` + `cvtss2si` even if SSE4 is available, but only if you're starting with float, not double. – Peter Cordes Feb 25 '22 at 22:22
  • Yes, it was typo for "subl magic2,%eax" -- I wanted to make the magic numbers stand out by defining them separately. I could have user $ magic1 and $magic2 with .equ for the same effect, though. – user1636349 Feb 26 '22 at 14:33
  • Note that .long = .int, which is 32 bits (at least on a 32-bit machine, which is where I tried this code). – user1636349 Feb 26 '22 at 14:37

1 Answers1

2

Quick answer:

The first magical number forces rescaling of the argument so that the LSbit of the integer part becomes rightmost-but-one in the fraction.

Then adding the second magical will erase the exponent bits.

Regarding the necessity of the last division by 2, IDK, there must be an extra technicality (probably related to the addition of 3.2^21 rather than 1.2^21).