6

The following functions do not compile with the 64 bits Delphi XE2 compiler. (The errors all relate to the fld instructions.)

[dcc64 Error] Project1.dpr(12): E2116 Invalid combination of opcode and operands 
[dcc64 Error] Project1.dpr(13): E2116 Invalid combination of opcode and operands
[dcc64 Error] Project1.dpr(20): E2116 Invalid combination of opcode and operands

Line 12 & 13:

fld Y
fld X

Line 20:

fld X

Unfortunately I have no assembly skills and I am using this third party code which I need to port to 64 bits. Can you help me in making it work on both 32 bits and 64 bits?

function PartArcTan(Y, X: Extended): Extended;
asm
  fld Y              // st(0) = Y
  fld X              // st(0) = X
  fpatan             // st(0) = ArcTan(Y, X)
  fwait
end;

function ArcSin(X: Extended): Extended; // -1 <= X <= 1
asm
  fld X               // st(0) = X
  fld st(0)           // st(1) = X
  fmul st(0), st(0)   // st(0) = Sqr(X)
  fld1                // st(0) = 1
  fsubrp st(1), st(0) // st(0) = 1 - Sqr(X)
  fsqrt               // st(0) = Sqrt(1 - Sqr(X))
  fpatan              // st(0) = ArcTan(X, Sqrt(1 - X*X))
  fwait
end;
Arioch 'The
  • 15,799
  • 35
  • 62
user1238784
  • 2,250
  • 3
  • 22
  • 41
  • possible duplicate of [FLD instruction x64 bit](http://stackoverflow.com/questions/15786404/fld-instruction-x64-bit) – Remy Lebeau Dec 24 '13 at 16:59
  • Use [`Math.ArcSin`](http://docwiki.embarcadero.com/Libraries/XE2/en/System.Math.ArcSin) and [`Math.ArcTan`](http://docwiki.embarcadero.com/Libraries/XE2/en/System.ArcTan). – LU RD Dec 24 '13 at 17:03
  • Correction, as David mentions in his answer, the [`Math.ArcTan2`](http://docwiki.embarcadero.com/Libraries/XE2/en/System.Math.ArcTan2) it should be. Don't use asm at all. – LU RD Dec 24 '13 at 19:57
  • There was a third-party unit of using x87 FPU in 64-bit Delphi XE2. It was written to re-enable 10-bytes (extended) data type for calculations where 8-bytes (double) float type was not enough. OTOH it worked even slower then x86 DCC native x87 - because an out-of-compiler unit could not analyze code flow for optimizations and had to insert `FWAIT` opcode after every statement. – Arioch 'The Dec 25 '13 at 07:31
  • @Arioch The Delphi compiler does no optimizations in floating point code gen, and does insert FWAIT after every statement. There are other reasons why x87 code is slow under x64. But 80 bit precision is the only reason to contemplate x87 opcodes under x64. – David Heffernan Dec 25 '13 at 08:28
  • @DavidHeffernan dcc32 can put single FWAIT after the whole expression evaluated for example. The external library has to isert it after every operation. – Arioch 'The Dec 25 '13 at 08:50
  • @Arioch That is true. Even so, that's not the main reason x87 is slow under x64. The main reason is that the transistor budget was spent making SSE fast. – David Heffernan Dec 25 '13 at 09:00
  • @DavidHeffernan i told that 3rd-party x64-x87 was slowER than native x86-x87 on *the same hardware* – Arioch 'The Dec 25 '13 at 09:28
  • @Arioch'The I'd expect x87 to be slower under x64 than under x86 – David Heffernan Dec 25 '13 at 09:30
  • @DavidHeffernan if mixed with SSE or MMX code - then sure. But FPU + integer/branching alone should have the same "transistor budget". Of course *other processes* may add SSE and MMX operations - but that holds true for x86+x87 app running inside WOW64 as well – Arioch 'The Dec 25 '13 at 09:36

2 Answers2

8

The main problem with this code, for porting to x64, is that it uses the wrong floating point unit. On x64 floating point is done on the SSE unit.

Yes, the x87 unit is still there, but it is slow in comparison. Another problem is that the x64 ABI assumes that you will use the SSE unit. Parameters arrive in SSE registers. Floating point values are returned in an SSE register. It's pointless (not to mention rather hard work and time consuming) to transfer values between SSE and x87 units. What's more, floating point control, exception masks, are initialised for the SSE unit, but are you sure that they will be correctly set for the SSE unit.

So, in view of all this, I strongly advise you to make sure that all your floating point code is executed on the SSE unit under x64. I think that the only time that a case could be made for using the x87 register is for an algorithm that requires the 10 byte extended type that is supported on x87 but not SSE. That is not the case here.

Now, porting to the SSE unit is not as simple as translating the opcodes to SSE equivalents. That's because the SSE floating unit has much less capability built-in. For instance, there are no trigonometric functions included in the SSE opcodes.

So, the right way to deal with this is to switch to using Pascal code. These functions can be replaced by Math.ArcTan2 and Math.ArcSin respectively.


To elaborate on this, let's look at what is involved in doing the calculation on the x87 unit, under x64. The code for ArcSin goes like this:

function ArcSin(X: Double): Double;
// to be 100% clear, do **not** use this code
asm
  movq [rsp-8], xmm0     // X arrives in xmm0, move it to stack memory
  fld qword ptr [rsp-8]  // now load X into the x87 unit
  fld st(0)              // calculation code exactly as before
  fmul st(0), st(0)
  fld1
  fsubrp st(1), st(0)
  fsqrt
  fpatan
  fwait
  fstp qword ptr [rsp-8] // but now we need to move the return value
  movq xmm0, [rsp-8]     // back into xmm0, again via the stack
end;

Points to note:

  1. The x64 ABI means that the input parameter arrives in xmm0. We cannot load that directly into the x87 unit. So we have to transfer from xmm0 to scratch memory on the stack, and then load from there into the x87 unit.
  2. And we have to do similar when returning the value. The value is returned in xmm0, as specified by the ABI. So we need to move out of the x87 unit, to scratch stack memory, and then load into xmm0.
  3. We've completely ignored floating point control word: exception masking, precision and rounding control etc. If you were to do this you'd need to put together a mechanism to make sure that the x87 unit's control word was handled in a sane manner.

So, perhaps this can serve as a warning to future visitors who wish to use the x87 to perform floating point arithmetic under x64.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • OK you are right I forgot to put result in to the xmm0 register. :) – GJ. Dec 24 '13 at 22:16
  • 3
    @GJ. On the contrary, Math.ArcTan2 and Math.ArcSin work on all Delphi compilers, not just the two Windows compilers. The asm code in my answer is a demonstration of what **not** to do. I thought the words made that clear. – David Heffernan Dec 25 '13 at 08:37
3

x64 still support classic floating point unit, but you need to adapt code to follow the different ABI.

x32/x64 example:

function PartArcTan(X: double): double;
asm
{$IFDEF CPUX64}
        movq [rsp-8], xmm0
        fld    qword ptr [rsp-8]
{$ELSE}
        fld    qword ptr X
{$ENDIF}
        fld1
        fpatan
        fwait
{$IFDEF CPUX64}
        fstp   qword ptr [rsp-8]
        movq   xmm0, [rsp-8]
{$ENDIF}
end;
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
GJ.
  • 10,810
  • 2
  • 45
  • 62
  • What state is the x87 unit's control word when running under x64? – David Heffernan Dec 24 '13 at 17:43
  • What do you mean "it is disabled"? – David Heffernan Dec 24 '13 at 17:45
  • Also, this code does not work because neither functions actually return a value. It's no good putting a value into `ST(0)` and hoping that the calling code will find it there. You have to respect the x64 ABI. So, put the return value in the right place! – David Heffernan Dec 24 '13 at 17:49
  • Registers are not preserved across windows API function calls. – GJ. Dec 24 '13 at 17:54
  • Yes, the result of function must be properly set before you exit function. – GJ. Dec 24 '13 at 17:55
  • There are no windows api calls here. You do need to fix the functions so that they return a value. Try calling these functions. Perhaps it will help you understand why x87 use is wrong under x64. – David Heffernan Dec 24 '13 at 18:02
  • And why use `Extended` in x64? – David Heffernan Dec 24 '13 at 18:07
  • I cannot understand the voting here. I downvoted to counter the two bogus anonymous upvotes! ;-) Your code produces access violations when you run it. The fact that it compiles is neither here nor there. 1. It doesn't work. 2. It uses the wrong FPU. 3. It does not read the parameters since they arrive in the SSE unit. 4. It doesn't return a value. In summary, it's a train wreck! – David Heffernan Dec 24 '13 at 18:10
  • Very good. Enjoy fixing this. Tactical deletion might be more prudent. – David Heffernan Dec 24 '13 at 18:20
  • @David Heffernan: No problem to fix the code, but this is another qustion. – GJ. Dec 24 '13 at 18:22
  • Well, if your only goal is code that compiles, then you can remove all the asm! I'd like to see you fix it though. You would learn something. The shuffle to get values from xmm registers into x87 registers, and then back again. In don't think you understand this. Trying to make it work would help you understand. – David Heffernan Dec 24 '13 at 18:26
  • You edit is still hopeless. The code in the answer leads to access violation. I do suggest you try to fix this as an education exercise. – David Heffernan Dec 24 '13 at 18:29
  • No you have not! Your function does not return a value. And it's wrong to use a var parameter. Would you like me to show you how to do it? – David Heffernan Dec 24 '13 at 19:05
  • Do you know what I mean by ABI? I would be very happy to teach you this. – David Heffernan Dec 24 '13 at 19:08
  • I edited my answer to demonstrate a working `ArcSin` using x87. – David Heffernan Dec 24 '13 at 19:46
  • 1
    @GJ., I don't see a point in doing this in asm at all. Use the RTL functions to support all platforms. There is little (if none) to gain using asm here, only confusion and this answer seems to add a bit to that. – LU RD Dec 24 '13 at 19:54
  • @David Heffernan: You are wrong! There is no need to put the result to the CPU stack, Delphi compiler expect the result under x64 in xmm0 register and there is. – GJ. Dec 24 '13 at 21:36
  • @David Heffernan: check win x64 Calling Conventions: http://msdn.microsoft.com/en-us/library/ms235286.aspx – GJ. Dec 24 '13 at 21:39
  • This I already know. I certainly never said that values were returned on the stack. Floating point return values come back in xmm0. Note the code in my answer which debunks yours. Can you explain where in your code you put anything into xmm0? – David Heffernan Dec 24 '13 at 21:44
  • Now you need to deal with the fact that you cannot pass literals to your function. Or constants. Eventually you'll have the same code as in my answer. – David Heffernan Dec 24 '13 at 22:47
  • @David Heffernan: It is not meant to be. It is just an example. – GJ. Dec 24 '13 at 22:51
  • But it's a very bad one. Anyway, I think I've made my point. I guess you can see now why you should not do it the way you did in your answer. – David Heffernan Dec 24 '13 at 22:52
  • @David Heffernan: I have answered on question what is the reason of dcc64 Error. :) – GJ. Dec 24 '13 at 22:58
  • movupd moves two values by the way. – David Heffernan Dec 24 '13 at 23:00
  • 1
    AFAICS there's only one question in the question: *" Can you help me in making it work on both 32 bits and 64 bits?"* – Sertac Akyuz Dec 24 '13 at 23:15
  • @David Heffernan: Agree! And you didn't answer on it! Because your code sample is only 64 bit! :)! – GJ. Dec 24 '13 at 23:19
  • @GJ. Read what I wrote. The functions that I advocate using work on all Delphi compilers. Windows 32 and 64. Mac. And the mobile compilers. Read the text not just the code. The code I wrote was to illustrate why x87 is a bad choice. My advice is that you do **not** ever user the code in my answer. Do you understand? – David Heffernan Dec 24 '13 at 23:44
  • On the other hand, what you advocate does not accept literals or constants. And uses an x87 control word that is I'll controlled. And in any case, you code is only as good as it is because I pointed out all the flaws in your numerous earlier worse attempts. Happy Christmas by the way!! – David Heffernan Dec 24 '13 at 23:46
  • @David Heffernan: Happy Christmas :) – GJ. Dec 24 '13 at 23:47
  • 2
    @GJ. - Sorry for intervention, that comment was not from David. I wanted to point out that the question was not why there's a compiling error. – Sertac Akyuz Dec 24 '13 at 23:49
  • So I fixed you code so that it works with pass by value. I'm still not sure that you understand my answer though. You do realise that I advocate using Math.ArcTan2? Have you compared the performance of your x87 code with Math.ArcTan2? – David Heffernan Dec 25 '13 at 08:34
  • @David Heffernan: I understand your answer and agree with you. Evertihing what I want to do at begining was to explain the compiler error, because you didn't. Thanks... .) – GJ. Dec 25 '13 at 09:53
  • You did not explain the compiler error though. The error was because the parameter arrives in xmm0 and so `FLD X` translates to `FLD xmm0` which is invalid. – David Heffernan Dec 25 '13 at 10:53
  • @David Heffernan: The error explanation is in compiler log: "E2116 Invalid combination of opcode and operands", so I have made workaround with var parameter and yes I agree, it is not the same! :) – GJ. Dec 25 '13 at 11:55
  • @GJ. It's odd that you attach a -1 with comment to my answer, discussing the use of the stack, but then the exact same code is in your answer. – David Heffernan Dec 26 '13 at 08:41
  • @David Heffernan: After our discussion I have changed to +1, comment deleted... :) – GJ. Dec 26 '13 at 10:47