how x87 precision affects square roots?

Question

I wrote some code to test the fsqrt function and the result doesn't make complete sense to me. Here's the code (in delphi):

uses
 mmsystem;

var
 rand:longint=123456789;

function rng:longint;
asm
 imul eax,[rand],$08088405
 inc eax
 mov [rand],eax
end;

function int_sqrt(adata:longint):longint;
asm
 fnstcw word([esp-2])

// mov word([esp-4]),$1f3f  // 80bit precision
 mov word([esp-4]),$1c3f  // 24bit precision
 fldcw word([esp-4])

 mov [esp-8],eax
 fild longint([esp-8])

 fsqrt

 fistp longint([esp-8])
 mov eax,[esp-8]

 fldcw word([esp-2])
end;

procedure TForm1.FormCreate(Sender: TObject);
var
 start,i,r,s1,s2:longint;
 time0,time1:longint;
begin
 timebeginperiod(1);
 time0:=timegettime;

 start:=1000000000;
 for i:=(start+0) to (start+100000000) do begin
  //r:=i;
  r:=abs(rng);
//  r:=2134567890;
//  r:=$7fffffff;
  s1:=int_sqrt(r);
  s2:=trunc(sqrt(r));
  if s1<>s2 then
   showmessage('error: '+inttostr(r)+'/'+inttostr(s1)+'/'+inttostr(s2));
 end;

 time1:=timegettime;
 timeendperiod(1);
 showmessage('Milliseconds: '+inttostr(time1-time0));
end;

Simple enough, I'm looking for the square root of an int. In the int_sqrt one of the precision lines gets the x87 to use 24 bit precision for the sqrt precision, the other 64 bit precision. As expected, the 24 bit version is faster by a good margin (10-20% depending on input).

Here's the problem though. I haven't found a single 32bit (well 31bit actually, the last bit is unused sign) int that returns a wrong result when using 24bit precision!!

My only theory so far is that only the final result depends on the precision, not the source or any intermediate buffer. That would make sense since the maximum result size for the square root of a 31bit int is 16bit.

Is that what's going on?

Because my input is up to 31bits, I though the 24bit precision would take that into account, but it looks only the final result is affected by precision. — Marladu, May 31 '14 at 00:56
24bit precision means 32 bit is used to store value, 24 bits for mantissa, 8 bits for exponent — Iłya Bursov, May 31 '14 at 01:05
Makes sense. If you want to make an answer that says *only* result of operations is affected by precision setting in x87 control word, and that source can be any size, I'll accept that as answer. — Marladu, May 31 '14 at 01:44
if you count single/float precision as 24 bits then long doube or 80-bit type has only 64 bits of precision — phuclv, May 31 '14 at 02:25
I guess the question was unclear: the 24 precision refers to the size of the significand, and I initially expected operations to use only 24 of the significand bits of the source for the operation. Instead, it appears they use all 64bits of significand from the source, and then stop computing when the significand of the result measures 24bits. — Marladu, May 31 '14 at 02:45

rkhb · Accepted Answer · 2014-05-31T17:12:44.273

Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol. 2A Page 3-291 (FILD):

Converts the signed-integer source operand into double extended-precision floating-point format and pushes the value onto the FPU register stack. The source operand can be a word, doubleword, or quadword integer. It is loaded without rounding errors.

Consider that data are stored inside the FPU always as 80-bit double extended-precision floating-point numbers. FILD and FIST don't "forget" bits according to the precision. The effect of precision is to abort a calculation when the result is enough precise, and to nullify the appropriate bits afterwards.

Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol. 1 Chapter 8.1.5.2 (Precision Control Field):

Using these settings nullifies the advantages of the double extended-precision floating-point format's 64-bit significand length. When reduced precision is specified, the rounding of the significand value clears the unused bits on the right to zeros.

So FSQRT works on the full 80-bit-register and aborts at a precision of 24 bits. I suspect that it aborts at a precision of 25 to get a significant value for rounding. Then the "redundant" 60 bits of the result will be nullified. You've got a 24-bit result and that is enough for a 16-bit integer as you noticed correct.

how x87 precision affects square roots?

1 Answers1

Linked