I wrote some code to test the fsqrt function and the result doesn't make complete sense to me. Here's the code (in delphi):
uses
mmsystem;
var
rand:longint=123456789;
function rng:longint;
asm
imul eax,[rand],$08088405
inc eax
mov [rand],eax
end;
function int_sqrt(adata:longint):longint;
asm
fnstcw word([esp-2])
// mov word([esp-4]),$1f3f // 80bit precision
mov word([esp-4]),$1c3f // 24bit precision
fldcw word([esp-4])
mov [esp-8],eax
fild longint([esp-8])
fsqrt
fistp longint([esp-8])
mov eax,[esp-8]
fldcw word([esp-2])
end;
procedure TForm1.FormCreate(Sender: TObject);
var
start,i,r,s1,s2:longint;
time0,time1:longint;
begin
timebeginperiod(1);
time0:=timegettime;
start:=1000000000;
for i:=(start+0) to (start+100000000) do begin
//r:=i;
r:=abs(rng);
// r:=2134567890;
// r:=$7fffffff;
s1:=int_sqrt(r);
s2:=trunc(sqrt(r));
if s1<>s2 then
showmessage('error: '+inttostr(r)+'/'+inttostr(s1)+'/'+inttostr(s2));
end;
time1:=timegettime;
timeendperiod(1);
showmessage('Milliseconds: '+inttostr(time1-time0));
end;
Simple enough, I'm looking for the square root of an int. In the int_sqrt one of the precision lines gets the x87 to use 24 bit precision for the sqrt precision, the other 64 bit precision. As expected, the 24 bit version is faster by a good margin (10-20% depending on input).
Here's the problem though. I haven't found a single 32bit (well 31bit actually, the last bit is unused sign) int that returns a wrong result when using 24bit precision!!
My only theory so far is that only the final result depends on the precision, not the source or any intermediate buffer. That would make sense since the maximum result size for the square root of a 31bit int is 16bit.
Is that what's going on?