2

For most numbers, we know there will be some precision error with any floating point value. For a 32-bit float, that works out the be roughly 6 significant digits which will be accurate before you can expect to start seeing incorrect values.

I'm trying to store a human-readable value which can be read in and recreate a bit-accurate recreation of the serialized value.

For example, the value 555.5555 is stored as 555.55548095703125; but when I serialize 555.55548095703125, I could theoretically serialize it as anything in the range (555.5554504395, 555.555511475) (exclusive) and still get the same byte pattern. (Actually, probably that's not the exact range, I just don't know that there's value in calculating it more accurately at the moment.)

What I'd like is to find the most human-readable string representation for the value -- which I imagine would be the fewest digits -- which will be deserialized as the same IEEE float.

AberrantWolf
  • 486
  • 5
  • 18
  • 1
    FYI, Java and JavaScript do this in their default formatting for `float` and `double`, so printing a `float` or `double` or converting one to a string in these languages accomplishes the goal, except for some decorative aspects (the shortest string for .3 would be “.3”, but they produce “0.3”). – Eric Postpischil Jul 16 '19 at 09:41
  • 1
    I'd second Python repr() does the same: finds a shortest decimal representation that is converted to the same value. There would be more standard libraries with the same facility. So you can simply pick and analyze a one closest to you. – Netch Jul 16 '19 at 19:51
  • In my case (and I probably should have specified this) I'm using C++, so neither JavaScript nor Python would be practical. Though I am now curious what process both languages use to come up with their string representations -- whether it's essentially `snprintf(val, "%1.6g")` or something else designed to produce a more elegant result. – AberrantWolf Jul 17 '19 at 00:56

1 Answers1

5

This is exactly a problem which was initially solved in 1990 with an algorithm the creators called "Dragon": https://dl.acm.org/citation.cfm?id=93559

There is a more modern technique from last year which is notably faster called "Ryu" (Japanese for "dragon"): https://dl.acm.org/citation.cfm?id=3192369

The GitHub for the library is here: https://github.com/ulfjack/ryu

According to their readme:

Ryu generates the shortest decimal representation of a floating point number that maintains round-trip safety. That is, a correct parser can recover the exact original number. For example, consider the binary 64-bit floating point number 00111110100110011001100110011010. The stored value is exactly 0.300000011920928955078125. However, this floating point number is also the closest number to the decimal number 0.3, so that is what Ryu outputs.

AberrantWolf
  • 486
  • 5
  • 18