23

I am trying to extract the bits from a float without invoking undefined behavior. Here is my first attempt:

unsigned foo(float x)
{
    unsigned* u = (unsigned*)&x;
    return *u;
}

As I understand it, this is not guaranteed to work due to strict aliasing rules, right? Does it work if a take an intermediate step with a character pointer?

unsigned bar(float x)
{
    char* c = (char*)&x;
    unsigned* u = (unsigned*)c;
    return *u;
}

Or do I have to extract the individual bytes myself?

unsigned baz(float x)
{
    unsigned char* c = (unsigned char*)&x;
    return c[0] | c[1] << 8 | c[2] << 16 | c[3] << 24;
}

Of course this has the disadvantage of depending on endianness, but I could live with that.

The union hack is definitely undefined behavior, right?

unsigned uni(float x)
{
    union { float f; unsigned u; };
    f = x;
    return u;
}

Just for completeness, here is a reference version of foo. Also undefined behavior, right?

unsigned ref(float x)
{
    return (unsigned&)x;
}

So, is it possible to extract the bits from a float (assuming both are 32 bits wide, of course)?


EDIT: And here is the memcpy version as proposed by Goz. Since many compilers do not support static_assert yet, I have replaced static_assert with some template metaprogramming:

template <bool, typename T>
struct requirement;

template <typename T>
struct requirement<true, T>
{
    typedef T type;
};

unsigned bits(float x)
{
    requirement<sizeof(unsigned)==sizeof(float), unsigned>::type u;
    memcpy(&u, &x, sizeof u);
    return u;
}
curiousguy
  • 8,038
  • 2
  • 40
  • 58
fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • I don't really see a problem with the very first approach - you don't even have two pointers pointing to the same object. You should be fine, although you may want a compile-time assert that sizeof(float)==sizeof(unsigned). I also don't see a problem with the union hack (although I would again verify the size). But I'm sure there are some obscure rules that I'm not aware of. Let's sit back and wait for people to prove me wrong! – EboMike Dec 01 '10 at 19:45
  • 1
    @Ebomike: The first method falls foul of the strict aliasing rules. Have a read of this: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html – Goz Dec 01 '10 at 19:47
  • Thanks, I knew someone would prove me wrong :) – EboMike Dec 01 '10 at 19:48
  • 1
    @Johannes: How is undefined behavior the safest bet? :) Writing to one union member and then reading from another is undefined. – fredoverflow Feb 08 '11 at 16:32
  • 1
    @FredOverflow well, even if it's UB, I don't think the compiler will go out of its way and sue you for doing it. Anyway, see below for a version that doesn't have the problem. GCC's aggressive optimizations are documented (in its manpage) to allow you to do the union cast. Allowing a necessary evil (it's sometimes not desirable to use library functions or relying on compiler intrinsics to optimize particular uses of memcpy). – Johannes Schaub - litb Feb 08 '11 at 16:42
  • IRRC, the struct hack is defined in C. That may put some incitations on compilers to do it intuitively in C++. – AProgrammer Mar 18 '11 at 15:36
  • @Aprogrammer: You mean the *union hack*, right? The struct hack has to do with arrays of unknown size as the last member of a struct. – fredoverflow Mar 19 '11 at 08:34

4 Answers4

17

About the only way to truly avoid any issues is to memcpy.

unsigned int FloatToInt( float f )
{
   static_assert( sizeof( float ) == sizeof( unsigned int ), "Sizes must match" );
   unsigned int ret;
   memcpy( &ret, &f, sizeof( float ) );
   return ret;
}

Because you are memcpying a fixed amount the compiler will optimise it out.

That said the union method is VERY widely supported.

Goz
  • 61,365
  • 24
  • 124
  • 204
  • I would go so far as to say I'd actually file a bug on any compiler that didn't support the union method. Yes, it's technically not part of the standard, but it is so widely used throughout embedded programming that a compiler which doesn't support it isn't very useful. – Crashworks Dec 01 '10 at 21:28
  • @FredOverflow ... typo ;) Fixed. – Goz Dec 01 '10 at 21:30
  • @Crashworks: You'd be fine reporting a bug ... it doesn't mean the compiler writer has to give a monkeys though ;) Their compiler could still be perfectly compliant. – Goz Dec 01 '10 at 21:33
  • 1
    Compliant, and not bought by us! – Crashworks Dec 01 '10 at 21:35
  • 2
    @Crashworks, hehehe. Personally though, I use the memcpy trick. It is VERY obvious exactly what you are doing to others :) – Goz Dec 01 '10 at 21:37
  • While this might avoid issues, it violates strict aliasing rules. You're casting a float pointer to void * and then into a byte array (depending on how `memcpy()` interprets it). Is that really better than invoking the undefined but widely supported union workaround? – onitake Dec 23 '11 at 15:04
  • 2
    @Goz: According to POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/functions/memcpy.html) and ISO C standards, it's void *. How the data is interpreted internally is left to the implementation. gcc translates memcpys into loops that transfer one basic machine unit per go, then the remainder using shorter loads/stores, for example. – onitake Jan 08 '12 at 17:11
6

The union hack is definitely undefined behavior, right?

Yes and no. According to the standard, it is definitely undefined behavior. But it is such a commonly used trick that GCC and MSVC and as far as I know, every other popular compiler, explicitly guarantees that it is safe and will work as expected.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • Out of interest - which part of it is undefined behavior? (other than you're misinterpreting a float as an integer) – EboMike Dec 01 '10 at 21:36
  • 4
    just that it's not allowed. Only one member of a union is "active" at a time. If you write to a member of a struct, then you are *only* allowed to read from that same member. The results of reading any other member is undefined. – jalf Dec 01 '10 at 22:17
  • 2
    @EboMike "other than" .. that's exactly what is UB. It's an aliasing violation to read from a member that is not aliasing compatible with the active member of the union. The following is fine for example: `union A { int a; unsigned char b; }; A x = { 10 }; return x.b;`, because you are allowed to access an `int` by an lvalue of type `unsigned char`. – Johannes Schaub - litb Feb 08 '11 at 16:52
  • The spec currently has no notion to forbid `union A { int a; float b; }; A x = { 0 }; float *b = &x.b; *b = 0.f; return x.b;`. The active member in this case is switch to `float` by writing through the float pointer, but when that write happens in a separate function, this becomes problematic (the compiler basically cannot apply the aliasing rule as it was intended by the Standard). See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#636 – Johannes Schaub - litb Feb 08 '11 at 16:56
  • @JohannesSchaub-litb: It seems the simple common sense answer would be to say that taking the address of a union member should allow the object to be used via the resulting pointer, or pointers derived from it, until the next time code accesses some other union member (via pointer not derived from the aforementioned one), crosses the start of a loop where that occurs, or enters a function where that occurs. Should be simple and practical to implement without hurting many actually-useful optimizations, while handling the common use cases for union-member pointers. – supercat Dec 05 '17 at 23:39
5

The following does not violate the aliasing rule, because it has no use of lvalues accessing different types anywhere

template<typename B, typename A>
B noalias_cast(A a) { 
  union N { 
    A a; 
    B b; 
    N(A a):a(a) { }
  };
  return N(a).b;
}

unsigned bar(float x) {
  return noalias_cast<unsigned>(x);
}
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • This proves the standard is broken. It is ridiculous that temporary.member is not a lvalue. I suppose the std guys got confused by the terms "rvalue" (as in value) and "rvalue" (a temporary). lol – curiousguy Oct 02 '11 at 18:12
  • 2
    @Johannes: Is this reasoning still true? Accessing `b` is accessing a non-active member of a union. – GManNickG Sep 07 '13 at 01:53
0

If you really want to be agnostic about the size of the float type and just return the raw bits, do something like this:

void float_to_bytes(char *buffer, float f) {
    union {
        float x;
        char b[sizeof(float)];
    };

    x = f;
    memcpy(buffer, b, sizeof(float));
}

Then call it like so:

float a = 12345.6789;
char buffer[sizeof(float)];

float_to_bytes(buffer, a);

This technique will, of course, produce output specific to your machine's byte ordering.

cdhowie
  • 158,093
  • 24
  • 286
  • 300