22

I need to use an existing text file to store some very precise values. When read back in, the numbers essentially need to be exactly equivalent to the ones that were originally written. Now, a normal person would use a binary file... for a number of reasons, that's not possible in this case.

So... do any of you have a good way of encoding a double as a string of characters (aside from increasing the precision). My first thought was to cast the double to a char[] and write out the chars. I don't think that's going to work because some of the characters are not visible, produce sounds, and even terminate strings ('\0'... I'm talkin to you!)

Thoughts?

[Edit] - once I figure out which of the solutions proposed works best for me, I'll mark one as 'the' solution.

fbl
  • 2,840
  • 3
  • 33
  • 41
  • 1
    If you want to be portable you can assume anything about the representation of the floating point number (there is no definition in the standard of the represented). Thus the **ONLY** way to do this portable is to just print the number with as much precision as you can get. Now if you want to throw away portability then you can have a binary format (encoded in Base64 if you want). But then you are going to loose precision when converting to the platform specific float format (unless it is exactly the same as the source system). But then you have gained nothing over printing at full precision. – Martin York Jan 10 '11 at 04:38
  • Write out the chars as text, ie, convert `c/16` and `c%16` to a character (either 0-9 or A-F) and print them out. – user168715 Jan 10 '11 at 04:39
  • 2
    One thing to consider is that your compiler may assign doubles to CPU registers with more than 64 bits of precision. When these values get written to memory in preparation of writing them to disk they will get truncated to 64 bits. So even if you save the double in binary and read it in, the read value is not guaranteed to == the original. – user168715 Jan 10 '11 at 04:44
  • 1
    no matter what encoding format you use, you will lose precision when loading into a system with less precision available. Do you care if it is human readable? do you care if it is fast to save/load? do you care how many bytes it takes to store? note that on many high precision systems you can use `long double` to get even more precision than `double` – fuzzyTew Jan 10 '11 at 04:48
  • 3
    @fuzzyTew: There is a reason why very few professional formats are binary nowadays. We tried that when we did not have the space but it does not pay off in the long run. The human readable form is much easier to use and maintain and you loose no precisions with human readable form. Compression is a really bad reason to choose a format in today's world (without some very specific reasons to do so). – Martin York Jan 10 '11 at 04:55
  • 3
    I wouldn't use `long double`. The entire x87 instruction set is considered deprecated on newer processors. For instance, 64bWin7 seems to disallow x87 in kernel and Intel, AMD, and Microsoft heavily discourage it's use. They all recommend using SSE2 math instead. So 10byte double seem to be out of style. – KitsuneYMG Jan 10 '11 at 04:58
  • "once I figure out which of the solutions proposed works best for me, I'll mark one as 'the' solution" :( – alfC Jan 19 '16 at 10:10

9 Answers9

20

If you want to keep the format strictly human readable, you can write out the double thusly:

#include <iomanip>
#include <sstream>

std::string doubleToText(const double & d)
{
    std::stringstream ss;
    //ss << std::setprecision( std::numeric_limits<double>::digits10+2);
    ss << std::setprecision( std::numeric_limits<int>::max() );
    ss << d;
    return ss.str();
}

std::numeric_limits<int>::max() will output with the maximum possible decimal precision. This will preserve the value most precisely across differently floating point implementations. Swapping that line for the commented line using std::numeric_limits<double>::digits10+2 will give just enough precision to make the double precisely recoverable on the platform the code is compiled for. This gives a much shorter output and preserves as much information as the double can uniquely represent.

The C++ stream operators do not preserve denormalized numbers or the infinities and not-a-numbers when reading strings in. However, the POSIX strtod function does, and is defined to by the standard. Hence, the most precise way to read a decimal number back with a standard library call would be this function:

#include <stdlib.h>

double textToDouble(const std::string & str)
{
    return strtod( str.c_str(), NULL );
}
fuzzyTew
  • 3,511
  • 29
  • 24
10

Assuming IEEE 754 double, printf("%.17g\n", x) will give you enough digits to recreate the original value.

dan04
  • 87,747
  • 23
  • 163
  • 198
  • what's the best way to parse that? what are the options for preserving infinities and maybe NaN? – fuzzyTew Jan 11 '11 at 18:50
  • You can detect all the special cases, like in http://www.cplusplus.com/forum/beginner/30400/ , but normally why would you need that? Also, decimal representation is surely nice and readable, but you have to remember that both mantissa and exponent in IEEE doubles are binary, so it would be hard to preserve all the bits... probably you'd need your own bin2dec,dec2bin functions anyway. – Shelwien Jan 12 '11 at 00:27
  • There's a lot of bad information in that thread. Divide by zero is done at runtime and assigned a result by the fpu. – fuzzyTew Jan 12 '11 at 03:02
  • The idea is that you can get the encoding for +INF and -INF like that. Otherwise you'd just have to directly access the double's bitfields, see http://en.wikipedia.org/wiki/Double_precision_floating-point_format . – Shelwien Jan 12 '11 at 23:24
  • Does not work, very tiny numbers with large exponents do not work. – jjxtra May 20 '16 at 20:49
4

A two step process: First use binary float/double serialization and then apply base 64 encoding. The result is not human readable, but will not loose precision.

Edit: (Thanks to fuzzyTew and dan04)

Lossless decimal and human readable representation is probably possible, but would require much more space.

Community
  • 1
  • 1
Juraj Blaho
  • 13,301
  • 7
  • 50
  • 96
  • 2
    It is certainly possible to create a human readable representation able to exactly represent a binary floating point number. – fuzzyTew Jan 10 '11 at 22:45
  • 1
    Right: 2 is a factor of 10, so all terminating binary fractions also terminate in base 10. Though it might take a lot of digits, like in 0.1000000000000000055511151231257827021181583404541015625. – dan04 Jan 11 '11 at 02:24
  • 1
    But it is not possible to represent decimal floating point as a binary floating point. Am I right? – Juraj Blaho Jan 11 '11 at 07:30
  • 2
    In general, no. 1/5 in binary is 0.0011 0011 0011 0011..., so any fraction with a factor of 5 in the denominator will not terminate in binary. – dan04 Jan 11 '11 at 14:26
2

To print long lists of numbers in C++ without loss (write and read in the same arquitecture) I use this (for doubles):

#include<iostream>
#include<iomanip>
#include<limits>
#include<cmath>

#include<sstream>
int main(){
std::ostringstream oss;

int prec = std::numeric_limits<double>::digits10+2; // generally 17

int exponent_digits = std::log10(std::numeric_limits<double>::max_exponent10)+1; // generally 3
int exponent_sign   = 1; // 1.e-123
int exponent_symbol = 1; // 'e' 'E'
int digits_sign = 1;
int digits_dot = 1; // 1.2

int division_extra_space = 1;
int width = prec + exponent_digits + digits_sign + exponent_sign + digits_dot + exponent_symbol + division_extra_space;

double original = -0.000013213213e-100/33215.;
oss << std::setprecision(prec) << std::setw(width) << original << std::setw(width) << original << std::setw(width) << original << '\n';
oss << std::setprecision(prec) << std::setw(width) << 1. << std::setw(width) << 2. << std::setw(width) << -3. << '\n';
}

prints

 -3.9780861056751466e-110 -3.9780861056751466e-110 -3.9780861056751466e-110
                        1                        2                       -3

In summary, in my case it is like setting:

oss << std::precision(17) << std::setw(25) << original << ...;

In any case I can test if this works, by doing:

    std::istringstream iss(oss.str());
    double test; iss >> test;
    assert(test == original);
alfC
  • 14,261
  • 4
  • 67
  • 118
2

You could use base 64. This would allow you to store the exact byte values in a text file.

I haven't used it, but I found this base 64 encoding/decoding library for C++.

Daniel Gallagher
  • 6,915
  • 25
  • 31
  • 3
    Except that has **nothing** to do with floating point. Just because people use it to encode binary data does not mean you can encode floating point numbers and expect them to come out the other end correctly!!!! – Martin York Jan 10 '11 at 04:36
  • 2
    since most systems tend to follow IEEE 754, you can encode floating point as binary data pretty well – fuzzyTew Jan 10 '11 at 04:37
  • @fuzzyTew: there is always a choise: either you sacrifice portability, either precision :) – ruslik Jan 10 '11 at 04:41
  • 1
    @ fuzzyTew: True. But if the format is the same on both platforms you are gaining nothing over printing at full precision (you are not gaining anything if the formats on both ends are the same if you print out the exact value that is read in (you only loose precision if one end truncates the data)). So you are sacrificing portability and gaining nothing (well I suppose you gain better compression). – Martin York Jan 10 '11 at 04:44
  • Base 64 would always use 13 characters to represent a 64-bit double, whereas printing the same double at full precision could take several times that, and it's much easier to forget to set a high precision on an output stream than it is to write out 8 bytes in base 64. As @fuzzyTew said, most modern systems follow IEEE 754, so portability probably isn't limited that much. – Daniel Gallagher Jan 10 '11 at 04:50
  • @Martin York: yes, the point of base64 would be to get the best possible compression. I think your solution is the best with regard to portability and human-readability. I'm curious why someone voted it down -- could there be information loss in some cases? – fuzzyTew Jan 10 '11 at 04:51
  • @ Daniel Gallagher: You speak of desktops (with intel chips). Now that mobile devices are so popular do they all conform IEEE 754? – Martin York Jan 10 '11 at 04:52
  • Point well taken. Though recent ARM VFP versions are IEEE 754 compliant. I suppose OP would need to determine just how portable the app needs to be. – Daniel Gallagher Jan 10 '11 at 05:06
  • @Daniel Gallagher - I'm not worried about mobile devices in this case. – fbl Jan 10 '11 at 13:59
  • @flevine100: base64 may be what you need then. It was designed to convert arbitrary binary into ASCII text. The biggest problem I have with it is that you would need to include an external library (or write your own encoder and decoder). It's also not human-readable in any important sense of the word. But if that field is just there for your program to read and write, it may be acceptable. – Daniel Gallagher Jan 10 '11 at 17:45
1

I was sure there was a special format specifier for printf (maybe %a?) that allowed printing the binary representation of a float, but I cannot find it..
However, you can try this:

int main(int argc, char* argv[]){
    union fi {
        unsigned int i;
        float        f;
    } num;
    num.f = 1.23f;
    printf("%X\n", num.i);
    return 0;
}
ruslik
  • 14,714
  • 1
  • 39
  • 40
  • 1
    Does not help. Neither the integer or floating point representations are guaranteed thus converting to an integer allows you to print a number but does not guarantee that another system will generate the floating point value. (Also you should add compile time checks that float/int are the same size). – Martin York Jan 10 '11 at 04:48
  • 4
    You could do what ruslik says and define the output as being IEEE 754. On any platform where this isn't the case, you'll have to do software conversion of the double. – KitsuneYMG Jan 10 '11 at 05:00
  • 1
    of course, this code is in c instead of c++ (question tag) and works on floats instead of doubles -- but it does solve the problem – fuzzyTew Jan 10 '11 at 05:08
0

Try this:

double d = 0.2512958125912;
std::ostringstream s;
s << d;

Then write s to file.

vdsf
  • 1,608
  • 2
  • 18
  • 22
0

You don't say why binary is off limits. For your application would conqverting the binary to a hex ASCII string be workable?

Frank Merrow
  • 949
  • 1
  • 8
  • 19
  • My only restriction is that I must output to a clear text file. There are other columns in the file that users need to access (using Excel, Matlab, etc). I want to have this data in the same file and write other tools that can resurrect the binary equivalent value. – fbl Jan 10 '11 at 13:55
0

Storage representation aside, what about something like this. Special values like -0, infinities, NaN etc would require special handling though. Also I "forgot" to implement negative exponents.

#include <stdio.h>
#include <math.h>

const int SCALE = 1<<(52/2);

void put( double a ) {
  FILE* f = fopen( "dump.txt", "wb" );
  int sign = (a<0); if( sign ) a=-a;
  int exp2 = 0; while( a>1 ) a/=2, exp2++;
  a*=SCALE;
  int m1 = floor(a);
  a = (a-m1)*SCALE;
  int m2 = floor(a);
  fprintf(f, "%i %i %i %i\n", sign, exp2, m1, m2 );
  fclose(f);
}

double get( void ) {
  FILE* f = fopen( "dump.txt", "rb" );
  double a;
  int sign, exp2, m1, m2;
  fscanf( f, "%i %i %i %i\n", &sign, &exp2, &m1, &m2 );
  fclose(f);
  printf( "%i %i %i %i\n", sign, exp2, m1, m2 );
  a = m2; a /= SCALE;
  a+= m1; a /= SCALE;
  while( exp2>0 ) a*=2, exp2--;
  if( a<0 ) a=-a;
  return a;
}

int main( void ) {
  union {
    double a;
    unsigned b[2];
  };
  a = 3.1415926;
  printf( "%.20lf %08X %08X\n", a, b[0], b[1] );
  put( a );
  a = get();
  printf( "%.20lf %08X %08X\n", a, b[0], b[1] );
}
Shelwien
  • 2,160
  • 15
  • 17