15

I stumbled across a code based on unions in C. Here is the code:

    union    {  
        struct  {  
            char ax[2];  
            char ab[2];  
        } s;  
        struct  {  
            int a;  
            int b;  
        } st;  
    } u ={12, 1}; 

    printf("%d %d", u.st.a, u.st.b);  

I just couldn't understand how come the output was 268 0. How were the values initialized? How is the union functioning here? Shouldn't the output be 12 1. It would be great if anyone could explain what exactly is happening here in detail.

I am using a 32 bit processor and on Windows 7.

undur_gongor
  • 15,657
  • 5
  • 63
  • 75
h4ck3d
  • 6,134
  • 15
  • 51
  • 74

5 Answers5

19

The code doesn't do what you think. Brace-initializes initialize the first union member, i.e. u.s. However, now the initializer is incomplete and missing braces, since u.s contains two arrays. It should be somethink like: u = { { {'a', 'b'}, { 'c', 'd' } } };

You should always compile with all warnings, a decent compiler should have told you that something was amiss. For instance, GCC says, missing braces around initialiser (near initialisation for ‘u.s’) and missing initialiser (near initialisation for ‘u.s.ab’). Very helpful.

In C99 you can take advantage of named member initialization to initialize the second union member: u = { .st = {12, 1} }; (This is not possible in C++, by the way.) The corresponding syntax for the first case is `u = { .s = { {'a', 'b'}, { 'c', 'd' } } };, which is arguably more explicit and readable!

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 2
    Actually you need *two* extra sets of braces (one for the union, one for the struct, and one for the array): `u = {{{'a', 'b'}, {'c', 'd'}}}`. – Adam Rosenfield Dec 07 '11 at 15:31
  • @KerrekSB Thank you. Probably the best answer. Could you also explain that why is the output `268 0` now that the initialization problem is over. – h4ck3d Dec 07 '11 at 16:00
  • @NiteeshMehra: 256 + 12... plus a large handful of luck – Kerrek SB Dec 07 '11 at 16:04
  • @KerrekSB Thanks I think I got it, correct me if am wrong : The bits are set like this `00000001 00001100` = 256+8+4 = 268. Right ? But then why is u.st.b = 0 ? – h4ck3d Dec 07 '11 at 16:14
  • @NiteeshMehra: No, it would have to be little-endian order to make sense. As for the rest: As I said, a lot of luck. This is undefined behaviour, anything could happen. It just turned out that the memory was zeroed at that point. – Kerrek SB Dec 07 '11 at 16:22
  • 3
    @NiteeshMehra: Here is what happened. s.ax[2] and s.ab[2] were packed into 4 bytes in little-endian order. Your initialization initialized s.ax[0] to 12 (0x0C) and s.ax[1] to 1 (0x01). Everything else was initialized to 0. (I don't know whether or not this is "spec" behavior, but it is what I would expect to happen.) Assuming you are using 32-bit ints, the resulting data layout is as follows (lowest byte on the left): 0C 01 00 00 00 00 00 00 So when you printed the first 4-byte int you got 0x0000010C, or 268, and when you printed the second 4-byte int you got 0x00000000, the mystery 0. – Andrew Cottrell Dec 07 '11 at 16:59
  • @KerrekSB In LE order : `01 0C` ? – h4ck3d Dec 07 '11 at 16:59
  • @AndrewCottrell So that means the lowest byte was assigned 12 , next byte 1 , and the remaining 6 bytes ( because size of union is 8 bytes cuz of the 2 int? ) were assigned 0 (this would depend upon compiler/machine whatever? ) . Did i get it right? So when i print u.st.a output would be the least 4 bytes (00 00 01 0C = 268) ? And when i would print u.st.b , output would be the remaining 4 bytes (00 00 00 00) ? – h4ck3d Dec 07 '11 at 17:08
  • @KerrekSB Right , so that would be `0C 01`. I hope am correct this time? – h4ck3d Dec 07 '11 at 17:10
  • @NiteeshMehra Yep, you got it. – Andrew Cottrell Dec 07 '11 at 20:23
6

Your code uses the default initializer for the union, which is its first member. Both 12 and 1 go into the characters of ax, hence the result that you see (which is very much compiler-dependent).

If you wanted to initialize through the second memmber (st) you would use a designated initializer:

union {  
    struct {  
        char ax[2];  
        char ab[2];  
    } s;  
    struct {  
        int a;  
        int b;  
    } st;  
} u ={ .st = {12, 1}}; 
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
5

The code sets u.s.ax[0] to 12 and u.s.ax[1] to 1. u.s.ax is overlayed onto u.st.a so the least-significant byte of u.st.a is set to 12 and the most-significant byte to 1 (so you must be running on a little-endian architecture) giving a value of 0x010C or 268.

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • How come `0x010C` ? Shouldn't that be `0x001C`? – h4ck3d Dec 07 '11 at 15:58
  • Thanks I think I got it, correct me if am wrong : The bits are set like this `00000001 00001100` = 256+8+4 = 268. Right ? But then why is u.st.b = 0 ? – h4ck3d Dec 07 '11 at 16:13
  • 1
    Yes - `00000001` `00001100` is `0x010C`. u.st.b is zero because only two values were provided and these were assigned to `u.s.ax[0]` and `u.s.ax[1]`. These two chars overlay the least-significant end of `u.st.a`, leaving the upper half and `u.st.b` set to zero by default. – Borodin Dec 09 '11 at 03:10
2

A union's size is the maximum size of the largest element that composes the union. So in this case, your union type has a size of 8-bytes on a 32-bit platform where int types are 4-bytes each. The first member of the union, s, though, only takes up 2-bytes, and therefore overlaps with the first 2-bytes of the st.a member. Since you are on a little-endian system, that means that we're overlapping the two lower-order bytes of st.a. Thus, when you initialize the union as it's done with the values {12, 1}, you've only initialized the values in the two lower-order bytes of st.a ... this leaves the value of st.b initialized to 0. Thus when you attempt to print out the struct containing the two int rather than char members of the union, you end up with your results of 128 and 0.

Jason
  • 31,834
  • 7
  • 59
  • 78
  • The output is `268` and `0`. Why does the first member of the union `s` take up 2 bytes ? Shouldn't it take 4 bytes? It has 2 char array of size 2. So it should be 2*2 = 4. Correct me if i am wrong. – h4ck3d Dec 07 '11 at 16:03
1

It probably assigned { 12 ,1 } to the first 2 char in s.ax.

So in a 32bit int it's 1*256 + 12 = 268

Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185