Convert Extended (80-bit) to string

Question

How can i convert an Extended precision floating point value to a string?

Background

The Intel CPU supports three floating point formats:

32-bit Single precision
64-bit Double precision
80-bit Extended precision

Delphi has native support for the Extended precision floating point format.

Extended precision is broken down into:

1 sign bit
15 exponent bits
1 integer portion bit (i.e. number starts with 0. or 1.)
63 mantissa bits

You can compare the mantissa size of Extended to that of the other float types:

| Type     | Sign  | Exponent | Integer | Mantissa | 
|----------|-------|----------|---------|----------|
| Single   | 1 bit |  8 bits  |  n/a    | 23 bits  |
| Double   | 1 bit | 11 bits  |  n/a    | 52 bits  |
| Extended | 1 bit | 15 bits  | 1 bit   | 63 bits  |

Extended is capable of higher precision that single and double.

For example, take the real number .49999999999999999, and it's representation in binary:

Single:   0.1000000000000000000000000
Double:   0.10000000000000000000000000000000000000000000000000000
Extended: 0.01111111111111111111111111111111111111111111111111111111010001111

You see that while Single and Double have been forced to round to 0.1 binary (0.5 decimal) , extended still has some precision.

But how to convert binary fractions to a string?

If i attempt to convert the extended value 0.49999999999999998 to a string:

FloatToStr(v);

the function returns 0.5, when i can see inside the Extended and see that it's not 0.5:

0x3FFDFFFFFFFFFFFFFD1E

The same is true for other Extended values; all the functions in Delphi (that i can find) all return 0.5:

Value                   Hex representation      FloatToSTr
0.499999999999999980    0x3FFDFFFFFFFFFFFFFD1E  '0.5'
0.499999999999999981    0x3FFDFFFFFFFFFFFFFD43  '0.5'
0.499999999999999982    0x3FFDFFFFFFFFFFFFFD68  '0.5'
0.499999999999999983    0x3FFDFFFFFFFFFFFFFD8D  '0.5'
0.499999999999999984    0x3FFDFFFFFFFFFFFFFDB2  '0.5'
0.499999999999999985    0x3FFDFFFFFFFFFFFFFDD7  '0.5'
0.499999999999999986    0x3FFDFFFFFFFFFFFFFDFB  '0.5'
0.499999999999999987    0x3FFDFFFFFFFFFFFFFE20  '0.5'
0.499999999999999988    0x3FFDFFFFFFFFFFFFFE45  '0.5'
0.499999999999999989    0x3FFDFFFFFFFFFFFFFE6A  '0.5'
0.499999999999999990    0x3FFDFFFFFFFFFFFFFE8F  '0.5'
...                     ...
0.49999999999999999995  0x3FFDFFFFFFFFFFFFFFFF  '0.5'

What function?

FloatToStr and FloatToStrF are both wrappers around FloatToText.

FloatToText ultimately uses FloatToDecimal to extract, from an extended, a record that contains the pieces of the float:

TFloatRec = packed record
   Exponent: Smallint;
   Negative: Boolean;
   Digits: array[0..20] of Byte;
end;

In my case:

var
   v: Extended;
   fr: TFloatRec;
begin
   v := 0.499999999999999980;

   FloatToDecimal({var}fr, v, fvExtended, 18, 9999);
end;

the decoded float comes back as:

Exponent: 0 (SmallInt)
Negative: False (Boolean)
Digits: [53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] (array[0..20] of Byte)

The Digits is in array of ascii characters:

Exponent: 0
Negative: False
Digits: '5'

FloatToDecimal is limited to 18 digits

The precision of the 63-bit mantissa of an extended precision float can go down to:

1 / (2^63)  
= 1.08420217248550443400745280086994171142578125 × 10^-19   
= 0.000000000000000000108420217248550443400745280086994171142578125
    \_________________/ 
            |
        19 digits

The issue is that:

Extended can give you meaningful values up to the 19th digit
FloatToDecimal, while returning up to 20 digits, only accepts and generates a maximum request of 18 digits for extended values (19 digits for currency)

For the documentation:

For values of type Extended, the Precision parameter specifies the requested number of significant digits in the result--the allowed range is 1..18.
The Decimals parameter specifies the requested maximum number of digits to the left of the decimal point in the result.
Precision and Decimals together control how the result is rounded. To produce a result that always has a given number of significant digits regardless of the magnitude of the number, specify 9999 for the Decimals parameter.
The result of the conversion is stored in the specified TFloatRec record as follows:

Digits - Contains up to 18 (for type Extended) or 19 (for type Currency) significant digits followed by a null terminator. The implied decimal point (if any) is not stored in Digits.

So i've hit a fundamental limitation of the built-in float formatting functions

How to format an 80-bit IEEE extended precision float?

If Delphi cannot do it itself, the question becomes: how do i do it?

I know the Extended is 10 bytes (SizeOf(Extended) = 10). The question now delves into the dark art of converting a IEEE float to a string.

Some parts are easy:

function ExtendedToDecimal(v: Extended): TFloatRec;
var
    n: UInt64;
const
    BIAS = 16383;
begin
    Result := Default(TFloatRec);

    Result.Negative := v.Sign;
    Result.Exponent := v.Exponent;
    n := v.Mantissa;
//  Result.Digits :=
end;

But the hard part is left as an exercise for the answer.

Bonus Screenshot

Note that `Extended` is 10 bytes only on Windows 32bit. On Windows 64bit and iOS devices, `Extended` is an alias for `Double`, and on OSX, iOS simulator, and Linux, `Extended` is 16 bytes. So the internal layout of `Extended` changes depending on platform. Use [`TExtendedHelper`](http://docwiki.embarcadero.com/Libraries/en/System.SysUtils.TExtendedHelper) and [`TExtended80Rec`](http://docwiki.embarcadero.com/Libraries/en/System.TExtended80Rec) to help you work with an `Extended`'s component fields across multiple platforms. — Remy Lebeau, Jun 21 '18 at 19:33
Did you look at John Herbsters [ExactFloatToStr(x:Extended)](https://cc.embarcadero.com/Item.aspx?id=19421)? — LU RD, Jun 21 '18 at 19:37
Delphi's FloatToStr can't even convert a double to string correctly..... — David Heffernan, Jun 21 '18 at 20:08
`ExactFloatToStr(Extended(0.49999999999999999))` gives: `0.49999999999999998999823495882122159628124791197478771209716796875` using the above linked library. — LU RD, Jun 21 '18 at 20:29
@LURD I'd not found [John's code](https://github.com/JackTrapper/Exact-Float-to-String-Routines) before; it works very well. Phrase that as an answer and you got yourself an accept. And because it will keep going until there's no more leftovers to add, you can also use it to print `Single` and `Double` as well as `Extended`. — Ian Boyd, Jun 22 '18 at 00:37
FWIW, that website is wrong: Extended has a 64 bit mantissa and no hidden bit. — Rudy Velthuis, Jun 22 '18 at 16:16
@David: Could you give an example of an incorrect conversion? — Rudy Velthuis, Jun 22 '18 at 16:26
@rudy We did this recently here on SO. And there's a QP report alongside it. Don't you remember. — David Heffernan, Jun 22 '18 at 16:45
@David: I remember talking about problems with StrToFloat (and I had already found that it is inaccurate -- up to 2 ulp -- in some cases). Was that about FloatToStr too? — Rudy Velthuis, Jun 22 '18 at 16:48
@DavidHeffernan Do you have a QC or link number for that FloatToStr problem. I don't remember seeing it. — Graymatter, Jun 22 '18 at 23:49

LU RD · Accepted Answer · 2018-06-22T05:43:03.083

6

How can i convert an Extended precision floating point value to a string?

Since the Delphi RTL does not have any implementations of a correct and complete FloatToStr() function for Extended (and Double for that matter), one would need to use an external library, found here and originally at EDN, Codecentral.

The library was created by John Herbster, a long time contributor to the Delphi RTL libraries, especially regarding floating point handling. The GitHub source code has been updated to use UniCode string handling and a TFormatSettings structure for formatting. The library contains an ExactFloatToStr() function that handles floats of Extended,Double and Single type.

Program TestExactFloatToStr; 

{$APPTYPE CONSOLE}

Uses
  SysUtils,ExactFloatToStr_JH0;

begin
  WriteLn(ExactFloatToStr(Extended(0.49999999999999999)));
  WriteLn(ExactFloatToStr(Double(0.49999999999999999)));
  WriteLn(ExactFloatToStr(Single(0.49999999999999999)));
  ReadLn;
end.

Outputs:

0.49999999999999998999823495882122159628124791197478771209716796875
0.5
0.5

edited Jun 22 '18 at 05:43

answered Jun 22 '18 at 05:31

LU RD

34,438
5
88
296

My [Exact command line tool](http://rvelthuis.de/programs/exact.html) can do this too. It uses my BigIntegers for this. – Rudy Velthuis Jun 22 '18 at 16:19
FWIW; I don't think that John Herbster wrote any Delphi RTL routines. The JOH mentioned in the RTL is John O'Harrow. John Herbster was a member of TeamB though. – Rudy Velthuis Jun 22 '18 at 16:58
@RudyVelthuis, I know the difference between JOH and JH. John contributed to "a lot" of QC reports, was extremely productively active in the delphi groups at borland and you refer to his work in your articles. – LU RD Jun 22 '18 at 17:16
@RudyVelthuis, from Johns CV: *"2002-2005. Served as invited member of Team-B which is dedicated to helping other users of Borland's programming tools. Expert on using floating point variables like those defined by IEEE-754 and used on PCs."* – LU RD Jun 22 '18 at 17:27
yes, but he didn't write any RTL functions for Delphi. I've talked to him very often, especially about these topics. I even met him in Scotts Valley during a TeamB meeting, when Borland still had half of the campus there. – Rudy Velthuis Jun 22 '18 at 18:29
@RudyVelthuis, I did not say he wrote any RTL functions, only that he contributed to the RTL libraries. – LU RD Jun 22 '18 at 18:36
@RudyVelthuis, I know for sure he was determined to add enough QC details to fix he `DateUtils` unit, which was a disaster at that time. – LU RD Jun 22 '18 at 18:45
no doubt. But no involvement in the RTL libraries (which is the RTL). – Rudy Velthuis Jun 23 '18 at 15:57