How to create a custom NaN (single precision) in python without setting the 23rd bit?

Question

I'm trying to create floating-point NaNs by choosing the fraction bits. But it seems that python float always set the 23rd fraction bit (IEEE754 single) when it interprets a NaN.

So, my question is: is it possible to define a float nan in python without it setting the 23rd bit?

(I'm using Python 2.7)

NaNs in IEEE 754 have this format:
sign = either 0 or 1.
biased exponent = all 1 bits.
fraction = anything except all 0 bits (since all 0 bits represents infinity).

So, a hex representation for a NaN could be 0x7F800001, but when interpreting this int as a float and interpreting it back to int gives 0x7FC00001

1st try: struct.pack/unpack:

import struct

def hex_to_float(value):
    return struct.unpack( '@f', struct.pack( '@L', value) )[0]

def float_to_hex(value):
    return struct.unpack( '@L', struct.pack( '@f', value) )[0]

print hex(float_to_hex(hex_to_float(0x7f800001)))
# 0x7fc00001

2nd try: ctypes

import ctypes

def float2hex(float_input):
    INTP = ctypes.POINTER(ctypes.c_uint)
    float_value = ctypes.c_float(float_input)
    my_pointer = ctypes.cast(ctypes.addressof(float_value), INTP)
    return my_pointer.contents.value

def hex2float(hex_input):
    FLOATP = ctypes.POINTER(ctypes.c_float)
    int_value = ctypes.c_uint(hex_input)
    my_pointer = ctypes.cast(ctypes.addressof(int_value), FLOATP)
    return my_pointer.contents.value

print hex(float2hex(hex2float(0x7f800001)))
# 0x7fc00001L

3rd try: xdrlib packers. Same result.

score 2 · Accepted Answer · answered May 26 '19 at 10:32

The underlying problem is that you convert a C-float (which has 32bit) to Python-float (which has 64bit, i.e. a double in C-parlance) and than back to C-float.

The execution of both cconversions after each other doesn't always lead to the original input - you are witnessing such a case.

If the exact bit-pattern is important, you should avoid the above conversions at any cost.

Here are some gory details:

So when struct.unpack('=f', some_bytes) (please note, that I use the standard size =-format character as compared to your usage of native size ('@'), for example @L means different things on Windows and Linux), the following happends:

unpack_float is called, which calls
_PyFloat_Unpack4, which interprets data (here or here) as a
32bit-c-float, i.e. float,
but converts it to double (because the function returns a `double') while returning.

On x86-64 the last conversion means the the operation VCVTSS2SD (i.e. Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value) and this opperation results in

0x7f800001 becomming 0x7ff8000020000000.

As you see, already the result of the operation struct.unpack( '=f', struct.pack( '=L', value) )[0] is not what was put in.

However, calling struct.pack(=f, value) for a python-float value (which is a wrapper around C's double), will get us to _PyFloat_Pack4, where the conversion from double to float happens, i.e. CVTSD2SS (Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value) is called and

0x7ff8000020000000 becomes 0x7fc00001.

Thanks for the in-depth explanation. I was really curious to know why that happens. — R T, May 27 '19 at 23:14

jsbueno · Answer 2 · 2019-05-25T01:20:09.333

What are you really trying to do?

Any Python code consuming floats will ignore a "specially crafted" NaN on the best, and crash on the worst case.

If you are passing this value to something outside Python code - serializing, or calling a C API, just define it with the exact bytes you want using struct, and sent those bytes to your desired destination.

Also, if you are using NumPy, then, yes, you can create the special NaNs and expect then to be reatiend within a ndarray - but the way to do that is also through dictacting the exact bytes you want with struct, and somehow converting the data-type while preserving the buffer contents .

Check this answer on building 80bit double numbers to use with NumPy to get hold of a workaround: Longdouble(1e3000) becomes inf: What can I do?

(I tried numpy.frombuffer here and it interprets the byte sequence you crafted there as a 32bit, if that will suit you:

import numpy as np
import binascii
a = "7f800001"
b = binascii.unhexlify(a) # in Python 2 a.decode("hex") would work, but not Python3
# little endian format we need to revert the byte order
c = "".join(b[::-1])
x = np.frombuffer(c, dtype="float32")
x.tobytes()

will print the original -

'\x01\x00\x80\x7f'

And checking the array x will show it is actually a NaN:

>>> x
array([nan], dtype=float32)

However, for the reasons above, if you extract the value from the numpy array with x[0], it will be converted to a "pasteurizd" float64 NaN, with the default value.

Thanks. That's what I feared. I think I'll have to live with this behavior instead of changing all my underlying code to cope with that corner case. — R T, May 27 '19 at 23:18

How to create a custom NaN (single precision) in python without setting the 23rd bit?

2 Answers2