2

I would like to access a single memory location with two different datatypes in the C programming language.

This is how I want it to be done:

I make a pointer and allocate 64 bits of memory for it. Then I want to access that memory by using either uint64_t or uint8_t[8].

Using unsigned long long int and unsigned char would not be correct because sizeof(unsigned char)==sizeof(uint8_t) is not always true.

I have a feeling that loops and copying memory is not really needed and I think that both

uint64_t abc = { 0xdeadbeefcafe1337 }

and

uint8_t[8] xyz = { 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0x13, 0x37 }

look the same in memory.

Edit: But why?

I want to make it easier to do simple addition on int and I would also like to access that int value in a simple array-like fashion, one-byte time.

Avinash Singh
  • 4,970
  • 8
  • 20
  • 35
jg6
  • 318
  • 2
  • 12
  • Can you point to a scenario in which `sizeof(uint8_t)` is not `sizeof(unsigned char)`? The real issue is that there is an aliasing exception specifically for character types, and depending on how `uint8_t` is defined, it could potentially not benefit from that exception. – Christian Gibbons Jun 11 '20 at 16:28
  • @ChristianGibbons You're right. According to `ISO/IEC 9899:TC3 6.5.3.4:3`, `sizeof(unsigned char)` must be 1. But I was looking at https://en.wikipedia.org/w/index.php?title=C_data_types&oldid=956131175#Basic_types and `unsigned char` had a specification of _Contains **at least** the [0, 255] range._ Edit: but what about platforms where a byte is not 8 bits? – jg6 Jun 11 '20 at 16:34
  • 1
    It's _still_ 1 (by definition). For example, on some TI DSPs, the smallest atomic/addressible unit is 16 bits. A `char` is 16 bits, but `sizeof(char)` is 1. My interpretation is that `char` is the smallest atomic unit and `sizeof(x)` returns "number of atomic units" and _not_ `number_of_bits / 8` – Craig Estey Jun 11 '20 at 16:50
  • When a byte is not 8 bits, `uint8_t` will probably not exist. The fixed-width integers are not guaranteed to be defined. `uint_least8_t` would still be defined, however. – Christian Gibbons Jun 11 '20 at 17:01
  • 1
    *I have a feeling that loops and copying memory is not really needed* You'd be wrong. See [**c - casting uint8_t* to uint32_t* behaviour**](https://stackoverflow.com/questions/60890346/c-casting-uint8-t-to-uint32-t-behaviour) along with [**C undefined behavior. Strict aliasing rule, or incorrect alignment?**](https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815). Note that last one documents failures on x86 systems. "I didn't see it blow up and fail" is not the same as "working properly". – Andrew Henle Jun 11 '20 at 17:01
  • See what compilers can do to you if you violate the strict aliasing rule: [**gcc, strict-aliasing, and horror stories**](https://stackoverflow.com/questions/2958633/gcc-strict-aliasing-and-horror-stories) – Andrew Henle Jun 11 '20 at 17:03

3 Answers3

2

you can use unions for that

typedef union 
{
    uint64_t u64;
    uint32_t u32[2];
    uint16_t u16[4];
    uint8_t  u8[8];
}u64;

void foo(void)
{
    u64 u;

    u.u64 =  0xdeadbeefcafe1337ULL;
    for(size_t i = 0; i < sizeof(u.u64); i++)
    {
        printf("byte %02d - 0x%hhX\n", i, u.u8[i]);
    }
}

void foo(void)
{
    u64 u;

    u.u64 =  0xdeadbeefcafe1337ULL;
    for(size_t i = 0; i < sizeof(u.u64); i++)
    {
        printf("byte %02d - 0x%hhX\n", i, u.u8[i]);
    }
}

void bar(void)
{
    u64 *u = malloc(sizeof(*u));

    u -> u64 =  0x1337cafebeefdeadULL;
    for(size_t i = 0; i < sizeof(*u); i++)
    {
        printf("byte %02d - 0x%hhX\n", i, u -> u8[i]);
    }
    free(u);
}

int main(void)
{
    foo();
    printf("-----------------------\n");
    bar();
}

https://godbolt.org/z/hkoqFN

0___________
  • 60,014
  • 4
  • 34
  • 74
  • Thanks for the answer! I just have one question, performance-wise. Will using data from unions (if we don't count time taken to define the union) create more load in the CPU/take longer than the answer by @PSkocik? – jg6 Jun 11 '20 at 18:35
  • One more thing. Why is the order of bytes reversed? Does this have to do with endianness and does that mean that data is stored with the least significant bit on the left? – jg6 Jun 11 '20 at 19:05
  • 1
    because x86 computers are little endian. Most of the modern systems is little endian. Bytes not bits. – 0___________ Jun 11 '20 at 19:06
  • This causes UB and the endianness is unknown. – 12431234123412341234123 Jan 20 '21 at 15:09
  • 1
    @12431234123412341234123 there is no UB here. It is a very, very well defined behaviour – 0___________ Jan 20 '21 at 15:25
  • @0___________ Writing to an union member and then reading a different member is UB. See also https://stackoverflow.com/questions/52290456/is-the-following-c-union-access-pattern-undefined-behavior – 12431234123412341234123 Jan 20 '21 at 15:52
  • 1
    @12431234123412341234123 in C++. Do not use links to the forum as a source of knowledge. It is not UB. But I EOD this discussion as partial knowledge is worse than the complete ignorance. In C it was a UB a decade ago. – 0___________ Jan 20 '21 at 18:49
  • @0___________ I linked it because it is harder to explain the standard in the comments. But go ahead, read the C99, C11 or C18 standard, it is UB in C to read a different union member than last time written. This has nothing to do with C++, i do not know about C++. – 12431234123412341234123 Jan 21 '21 at 13:52
  • 1
    @12431234123412341234123 You simply do not understand it, that is the problem. I – 0___________ Jan 21 '21 at 13:55
2

You can always access an object of any type as an array of characters (char, unsigned char, or signed char) and uint8_t is in 99.999% (100%?) cases just an unsigned char.

I.e., you can simply do:

uint64_t abc = { 0xdeadbeefcafe1337 };
uint8_t *pabc = (uint8_t*)&abc[0];

and use the pointer to inspect or modify abc.

Note that strict aliasing wouldn't allow you to do that with:

uint64_t abc = { 0xdeadbeefcafe1337 };
uint32_t *pabc = (uint32_t*)&abc[0]; 
//^just this is ok but derefing this *pabc would violate strict aliasing

There you'd need a union or memcpy, which would practically get optimized just the same as a direct dereference but the dereference is prohibited to help the compiler with alias analysis which helps with better codegen.

Going in the reverse direction, i.e., accessing (even a properly aligned and properly sized) declared uint8_t array as an uint64_t is also not allowed and it has nothing to do with how the memory would look like and everything to do with alias analysis.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • Thanks for the answer! I just have one question, performance-wise. Will using data from unions, as answered by @P__J__, (if we don't count time taken to define the union) create more load in the CPU/take longer than your answer? – jg6 Jun 11 '20 at 18:36
  • 1
    @sijanec No. Unions, types and dereferences and even memcpy (it's really more like an operator than a regular function call) are just compiler abstraction over registers and memory. An optimizing compiler will remove those abstractions and try and optimize everything. Don't believe me, look at the disassebly and/or benchmark your code. – Petr Skocik Jun 11 '20 at 18:42
0

ok start from scratch, take a look at this code

#include <stdio.h>
#include <stdint.h>
int main()
{
    uint64_t abc = 0xdeadbeefcafe1337;
    uint8_t xyz[8] = { 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0x13, 0x37 };

    printf("%x\n",xyz[0]);
    printf("%x\n",((uint8_t*)&abc)[0]); //is the same of *(uint8_t*)&abc
    return 0;
}

and after that take a read https://en.wikipedia.org/wiki/Endianness. Beware do not confuse how the data are stored in cpu register, it is another story.

After that you will get your answer.

Cheers

sukoy
  • 11
  • 1
  • 3
  • @P__J__ Thanks. After reading about Endianness, this concerns me. Can endianness be different amongst unsigned ints of different sizes? Can I somehow tell the compiler (?) that big-endian should always be used or is it architecture dependent? I was looking at this https://www.geeksforgeeks.org/little-and-big-endian-mystery/#tablist1-panel1 and the problem is clearly visible. – jg6 Jun 11 '20 at 18:45
  • endianness is architecture-dependent. Your platform may define the byte order so that you can take it into account in your code. On Linux, at least, you should have which will define `__BYTE_ORDER` to be `__BIG_ENDIAN`, `__LITTLE_ENDIAN`, or even `__PDP_ENDIAN` – Christian Gibbons Jun 11 '20 at 19:40