0

Due to optimization reasons I thought about calling directly (with inline assembler) to the functions "fldl" and "fist". Sadly I don't get how to run it since i'm not that god in assembler.

i didn't get further than this:

double* input;         
long long output;

__asm fldl input;      
__asm fist output;
Jens Björnhager
  • 5,632
  • 3
  • 27
  • 47
okaerin
  • 789
  • 5
  • 23
  • 1
    If you're not fluent enough in assembler to call a function, then probably you should let the compiler do its optimizations. You have to be **very** good and experienced in assembly to outperform a modern optimizing compiler by hand-tuning. –  Nov 30 '12 at 16:59
  • 3
    I really doubt manual conversion of double to long int results in any optimization compared to compiler. This is such simple operation, that it may more probably lead to worse performance -- as compilers generally can "look" outside each instruction. (is the result needed, in which register to store it; would it be better placed in xmm register etc.) – Aki Suihkonen Nov 30 '12 at 16:59
  • @H2CO3: I'd say, probably even that god. – Aki Suihkonen Nov 30 '12 at 17:01
  • 1
    @AkiSuihkonen, conversions from double to integer are known to be subobtimal when left to the compiler because it can't make any simplifying assumptions. See http://stackoverflow.com/questions/78619/what-is-the-fastest-way-to-convert-float-to-int-on-x86 although that might be outdated: http://msdn.microsoft.com/en-us/library/z8dh4h17.aspx – Mark Ransom Nov 30 '12 at 17:11
  • It may be true. Still e.g. gcc 4.6.3 allows itself to made some assumptions and simply write `cvttsd2siq %xmm0, %rax` (and yes, haven't got unfortunately x86 system anywhere nearby, just x64) – Aki Suihkonen Nov 30 '12 at 17:20

2 Answers2

1

__asm fld input will actually attempt to read your pointer value as if it were a floating-point value. If you want to read a floating-point value pointed by a pointer, you have to go through a two-step process: read the address into a register and then read the data using the address in the register. On a 32-bit platform it will be something along the lines of

__asm {
  mov eax, input
  fld qword ptr [eax]
  fistp output
}

I just tried it in VS2005 and it works. (Note that as other people stated in the comments, fist does not support storing to 64-bit long long, while fistp does. But you probably need fistp anyway, i.e. a popping store.)

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Unless you're going to keep using the value, you probably want `fistp` instead of `fist`. – Jerry Coffin Nov 30 '12 at 16:56
  • there is the slight problem, that its a x86 architecture so there is no rax register – okaerin Nov 30 '12 at 16:58
  • 1
    Also `fist` converts to 32-bit integer, not to `long long`, unlike `fistp` (opcode DF /7). – chill Nov 30 '12 at 17:12
  • @chill: MSVC++ inline assembler automatically selects the proper opcode depending on the size of the recipient variable. So the choice between `fist` and `fistp` here depends only on whether the OP wants to pop the result from FPU stack – AnT stands with Russia Nov 30 '12 at 17:15
  • @chil: However, you are right: it works with `fistp` and doesn't work with `fist` (???) :) – AnT stands with Russia Nov 30 '12 at 17:21
  • 2
    @AndreyT: x86 instruction encoding is inconsistent -- `fist` only supports 16 and 32 bit operands, `fistp` and `fisttp` support 16, 32, and 64 – Chris Dodd Nov 30 '12 at 17:44
1

The easiest is probably:

double input;
long long output;

__asm fld input
__asm fisttp output

This does a 'normal' double to long long conversion, truncating towards zero, just like a C cast. Very old (pre Pentium4) CPUs don't support fistpp, however, so on such machines you need to use fistp instead, which uses the current rounding mode (usually round to nearest). So if you want to eg round towards -infinity, you need to save the current rounding mode, set it to what you want, do the fistp and restore the rounding mode:

double input;
long long output;
unsigned short oldcw, cw;

__asm fld input
__asm fstcw oldcw
cw = (oldcw & ~0xc00) | 0x400; // round towards -infinity
__asm fldcw cw
__asm fistp output
__asm fldcw oldcw
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226