19

Can every possible value of a float variable can be represented exactly in a double variable?

In other words, for all possible values X will the following be successful:

float f1 = X;
double d = f1;
float f2 = (float)d;

if(f1 == f2)
  System.out.println("Success!");
else
  System.out.println("Failure!");

My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).

Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.

MPelletier
  • 16,256
  • 15
  • 86
  • 137
Kip
  • 107,154
  • 87
  • 232
  • 265

11 Answers11

26

Yes.

Proof by enumeration of all possible cases:

public class TestDoubleFloat  {
    public static void main(String[] args) {
        for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
            float f1 = Float.intBitsToFloat((int) i);
            double d = (double) f1;
            float f2 = (float) d;
            if (f1 != f2) {
                if (Float.isNaN(f1) && Float.isNaN(f2)) {
                    continue; // ok, NaN
                }
                fail("oops: " + f1 + " != " + f2);
            }
        }
    }
}

finishes in 12 seconds on my machine. 32 bits are small.

Kip
  • 107,154
  • 87
  • 232
  • 265
mfx
  • 7,168
  • 26
  • 29
  • This doesn't actually test all numbers representable by Floats; Floats cannot exactly represent integers above 2^23 or so. – MSN Jan 22 '09 at 19:30
  • 18
    It enumerates all possible floats by enumerating all ints (which have the same size) and converting their bit patterns to float. – mfx Jan 22 '09 at 22:19
  • 2
    If (float)"INF" is interpreted to mean "computation in excess of 2^127", there's no equivalent `double`. Casting such a `float` value to `double` will yield "computation in excess of 2^1023". Only off by hundreds or orders of magnitude. – supercat Aug 28 '12 at 03:52
5

In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.

James Curran
  • 101,701
  • 37
  • 181
  • 258
  • please clarify the 'no' - the two questions in the original post are contradictory, so I can't tell which you're answering. – Alnitak Nov 03 '08 at 15:46
  • 1
    Inserting 32 0 bits, yes. Inserting them all at the end, no. Some are added to the mantissa, some to the exponent. – Steve Jessop Nov 03 '08 at 18:20
  • 1
    Actually, I tell a lie, the exponent of course isn't 0-extended, because the bias is different in a double. So converting involves a small amount of actual arithmetic. – Steve Jessop Nov 03 '08 at 18:26
  • 1
    This answer is almost correct - the double type is not just a simple extension in 32 bits of the float type. The exponent field is actually extended from 8 to 11 bits, so what's left for extending the mantissa field is just 29 bits, not 32. (Yes, I noticed the discussion is a little old, but for the sake of future generations...) – ysap Mar 19 '10 at 19:32
5

Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.

MSalters
  • 173,980
  • 10
  • 155
  • 350
3

As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)

unwind
  • 391,730
  • 64
  • 469
  • 606
3

If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.

That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.

(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)

Mitch Flax
  • 612
  • 1
  • 6
  • 14
  • @Mitch: casting from Double to Float is guaranteed to loose precision. You can do float -> double -> float and get the same answer back. But if you have a double value that's the result of some calculation, it can't be cast to Float without having bits discarded. – S.Lott Nov 03 '08 at 15:59
1

@KenG: This code:

float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"

fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:

0 01111011 10011001100110011001101

and the bit pattern for d is:

0 01111111011 100110011001100110011010 (followed by lots more zeros)

which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.

There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.

Skizz
  • 69,698
  • 10
  • 71
  • 108
0

If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.

Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • I take issue with "floating point values are approximations". All IEEE floating point numbers represent an exact value in base 2. They are often approximations of base-10 numbers that can't be expressed exactly in base 2, but each one has an exact value. – Kip Aug 28 '12 at 15:36
  • @Kip: The behavior of numbers is precisely specified, but I think it's more helpful to regard 1.1f as representing "1+(1,677,722±½/16,777,216)" than a precise quantity "1+(1,677,722/16,777,216)". If it's a precise quantity, then why does outputting the above number yield "1.1" rather than "1.10000002384185791015625"? If one regards the number as "something between 0.0999999940395355224609375 and 0.1000000536441802978515625", then it's clear that those extra digits past the string of zeroes aren't meaningful. But if one regards it as a precise fraction, they are. – supercat Aug 28 '12 at 16:46
  • If one computes (11.0/10.0) and casts the result to a `float`, the result will be *the correct `float` representation for the fraction 11/10*. If one computes (11.0f/10.0f) and casts the result to a `double`, the result will not be the correct `double` representation for the fraction 11/10. Converting `double` to `float` loses precision, but maintains correctness. Converting `float` to `double` doesn't lose numerical precision, but often does lose correctness. – supercat Aug 28 '12 at 16:52
0

Snark: NaNs will compare differently after (or indeed before) conversion.

This does not, however, invalidate the answers already given.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
0

I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D

I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:

**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606

I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.

Here is the simple C++ code:

typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
    float f1 = *(float *)&i;
    double d = f1;
    float f2 = (float)d;
    if(f1 != f2)
        printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
    if ((i % 1000000) == 0)
        printf("Iteration: %d\n", i);
}
Ryan
  • 7,835
  • 2
  • 29
  • 36
0

The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.

Chris Dodd
  • 2,920
  • 15
  • 10
0

In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.

When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.

IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.

Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.

The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.

The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.

I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    == is not a bit comparison. In C, C++, C#, Java, Javascript, etc., 0 == negative 0. The Double.equals() method does a bit comparison. – A. Rex Jan 13 '09 at 16:52
  • 1
    -1 for anti-IEEE-float propaganda with misleading and outright incorrect information. – R.. GitHub STOP HELPING ICE Aug 01 '10 at 20:03
  • @R. I tell you what go and build a few fpu's from scratch that pass TestFloat level 3 and that meet the IEEE spec in hardware without software kludges (like most of the ones on the market) and then lets chat about how to build a better fpu with a better spec. – old_timer Aug 02 '10 at 02:35
  • The biggest deficiency I can see with the IEEE spec is that from a numerical standpoint the zero which is produced by subtracting two equal numbers should not be positive. The expression 1.0/(1.0-1.0) should yield NaN, not +INF. If one wishes 1/(1/+INF) to yield +INF rather than NaN, there should be positive, negative, and unsigned zeroes (the former two being generated by multiplications or divisions which underflow, and the latter being produced by the subtraction of equal numbers). If I were designing a spec, I might consider using a different approach to... – supercat Aug 28 '12 at 16:59
  • ...ensuring that the smallest interval between numbers is not smaller than the smallest representable number, but I don't think I'd call the spec "horrible". What would you do different? – supercat Aug 28 '12 at 17:04
  • now I am not as familiar with the most recent spec as the one from 10 years ago, fourteen pages of hell. The result from an operation changes depending on the mode for example. A divide by zero has different answers if you have exceptions enabled or not was one I seem to remember. The rounding modes, the nans, the plus and minus zero, the exception rules and non-exception rules, tiny numbers, etc, it just piles on. On the other hand, using a ti dsp format, one zero, no denormals no nans, one divide by zero rule, no rounding, etc was trivial, took a week – old_timer Aug 28 '12 at 21:19
  • what is most bothersome is that programmers of high level languages dont know enough about floating point to use it properly, lots of unnecessary error is added, they could have just used fixed point and been as happy. Almost none of the features of IEEE are used, 90% use a few percent of the features that kind of thing. just think about the chip real estate that is wasted. you want to play cool videogames get a video card with a bunch of video processors, you want to do scientific research get a real math processor – old_timer Aug 28 '12 at 21:25
  • you want to compute the pixels for a 12 point font, use a minimal, dumb, cheap, fast, fpu or just use fixed point. – old_timer Aug 28 '12 at 21:26
  • the way the conspiracy theory/old wives tale goes is that intel was beginning to dominate the processor market. and in fear of them dominating the floating point coprocessor market companies got together to create a spec, and to create a spec so difficult to meet that either nobody would meet it (esp intel) or at least they all had a fair chance of getting there. intel won anyway and we are stuck with this spec and its derivatives. I was around but not old enough to know if there is anything at all to this, but I have been on an fpu team. – old_timer Aug 29 '12 at 00:09