How can i convert an Extended precision floating point value to a string?
Background
The Intel CPU supports three floating point formats:
- 32-bit Single precision
- 64-bit Double precision
- 80-bit Extended precision
Delphi has native support for the Extended precision floating point format.
Extended precision is broken down into:
- 1 sign bit
- 15 exponent bits
- 1 integer portion bit (i.e. number starts with
0.
or1.
) - 63 mantissa bits
You can compare the mantissa size of Extended to that of the other float types:
| Type | Sign | Exponent | Integer | Mantissa |
|----------|-------|----------|---------|----------|
| Single | 1 bit | 8 bits | n/a | 23 bits |
| Double | 1 bit | 11 bits | n/a | 52 bits |
| Extended | 1 bit | 15 bits | 1 bit | 63 bits |
Extended is capable of higher precision that single and double.
For example, take the real number .49999999999999999
, and it's representation in binary:
Single: 0.1000000000000000000000000
Double: 0.10000000000000000000000000000000000000000000000000000
Extended: 0.01111111111111111111111111111111111111111111111111111111010001111
You see that while Single and Double have been forced to round to 0.1 binary
(0.5 decimal) , extended still has some precision.
But how to convert binary fractions to a string?
If i attempt to convert the extended value 0.49999999999999998
to a string:
FloatToStr(v);
the function returns 0.5
, when i can see inside the Extended and see that it's not 0.5:
0x3FFDFFFFFFFFFFFFFD1E
The same is true for other Extended values; all the functions in Delphi (that i can find) all return 0.5:
Value Hex representation FloatToSTr
0.499999999999999980 0x3FFDFFFFFFFFFFFFFD1E '0.5'
0.499999999999999981 0x3FFDFFFFFFFFFFFFFD43 '0.5'
0.499999999999999982 0x3FFDFFFFFFFFFFFFFD68 '0.5'
0.499999999999999983 0x3FFDFFFFFFFFFFFFFD8D '0.5'
0.499999999999999984 0x3FFDFFFFFFFFFFFFFDB2 '0.5'
0.499999999999999985 0x3FFDFFFFFFFFFFFFFDD7 '0.5'
0.499999999999999986 0x3FFDFFFFFFFFFFFFFDFB '0.5'
0.499999999999999987 0x3FFDFFFFFFFFFFFFFE20 '0.5'
0.499999999999999988 0x3FFDFFFFFFFFFFFFFE45 '0.5'
0.499999999999999989 0x3FFDFFFFFFFFFFFFFE6A '0.5'
0.499999999999999990 0x3FFDFFFFFFFFFFFFFE8F '0.5'
... ...
0.49999999999999999995 0x3FFDFFFFFFFFFFFFFFFF '0.5'
What function?
FloatToStr and FloatToStrF are both wrappers around FloatToText.
FloatToText ultimately uses FloatToDecimal to extract, from an extended, a record that contains the pieces of the float:
TFloatRec = packed record
Exponent: Smallint;
Negative: Boolean;
Digits: array[0..20] of Byte;
end;
In my case:
var
v: Extended;
fr: TFloatRec;
begin
v := 0.499999999999999980;
FloatToDecimal({var}fr, v, fvExtended, 18, 9999);
end;
the decoded float comes back as:
- Exponent: 0 (SmallInt)
- Negative: False (Boolean)
- Digits: [53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] (array[0..20] of Byte)
The Digits
is in array of ascii characters:
- Exponent:
0
- Negative:
False
- Digits:
'5'
FloatToDecimal is limited to 18 digits
The precision of the 63-bit mantissa of an extended precision float can go down to:
1 / (2^63)
= 1.08420217248550443400745280086994171142578125 × 10^-19
= 0.000000000000000000108420217248550443400745280086994171142578125
\_________________/
|
19 digits
The issue is that:
- Extended can give you meaningful values up to the 19th digit
- FloatToDecimal, while returning up to 20 digits, only accepts and generates a maximum request of 18 digits for extended values (19 digits for currency)
For the documentation:
For values of type Extended, the Precision parameter specifies the requested number of significant digits in the result--the allowed range is 1..18.
The Decimals parameter specifies the requested maximum number of digits to the left of the decimal point in the result.
Precision and Decimals together control how the result is rounded. To produce a result that always has a given number of significant digits regardless of the magnitude of the number, specify 9999 for the Decimals parameter.
The result of the conversion is stored in the specified TFloatRec record as follows:Digits - Contains up to 18 (for type Extended) or 19 (for type Currency) significant digits followed by a null terminator. The implied decimal point (if any) is not stored in Digits.
So i've hit a fundamental limitation of the built-in float formatting functions
How to format an 80-bit IEEE extended precision float?
If Delphi cannot do it itself, the question becomes: how do i do it?
I know the Extended is 10 bytes (SizeOf(Extended) = 10
). The question now delves into the dark art of converting a IEEE float to a string.
Some parts are easy:
function ExtendedToDecimal(v: Extended): TFloatRec;
var
n: UInt64;
const
BIAS = 16383;
begin
Result := Default(TFloatRec);
Result.Negative := v.Sign;
Result.Exponent := v.Exponent;
n := v.Mantissa;
// Result.Digits :=
end;
But the hard part is left as an exercise for the answer.