0

I need to find the most standards-compliant way to obtain the address of a pointer and store its bytes separately (for instance, to transmit them serially).

I have two versions below, the first one which contains, I believe, undefined behavior, and the second one, which should contain only defined behavior according to C99. But my tool tells me I have undefined behavior on the second one as well. Could someone please confirm it, and indicate a solution with neither undefined behavior, nor implementation-defined behavior, if possible?

Edit: I changed the type from int to unsigned long to aid in finding a non-implementation-dependent solution. I also removed the "16-bit wide pointer".

unsigned long a[2];
unsigned char b0, b1, b2, b3;

int main1() {
  unsigned long l = (unsigned long) &(a[0]);
  b0 = (l >> 24) & 0xFF;
  b1 = (l >> 16) & 0xFF;
  b2 = (l >> 8) & 0xFF;
  b3 = l & 0xFF;
  return 0;
}


typedef union { unsigned long* p; char c[sizeof(unsigned long *)]; } u;

int main2() {
  u x;
  x.p = a;
  b0 = x.c[3];
  b1 = x.c[2];
  b2 = x.c[1];
  b3 = x.c[0];
  return 0;
}

Edit 2: added reference to a part of the C99 standard concerning these programs:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Does it mean it is not possible to read the address of array a without relying on some implementation-defined behavior? Or is there a way to circumvent it?

anol
  • 8,264
  • 3
  • 34
  • 78
  • 2
    Are you sure you want to transmit addresses? They won't be valid on another host or another address space anyway. – Maxim Egorushkin May 22 '13 at 10:23
  • 1
    I'm not aware of any implementation of C that has 16-bit pointers, but assuming you're right about that (some small embedded system or something?) there's nothing wrong with main1(). You're apparently using a union or something for main2(), but you're not showing us that code so we can't help you. – Lee Daniel Crocker May 22 '13 at 10:33
  • `int i = (int) &(a[0]);` con be problematic because `sizeof(int) != sizeof(void*)` on some systems – Grijesh Chauhan May 22 '13 at 10:37
  • As I said, I'm assuming he's right about 16-bit pointers (which is not the case on any system I know of). The only possible "undefined behavior" is throwing away all but the low 16 bits of pointer. – Lee Daniel Crocker May 22 '13 at 10:43
  • @GrijeshChauhan: The OP does not use `void *`. It is using `int *`. – alk May 22 '13 at 11:27
  • @alk `sizeof(void *)` always **=** `sizeof(int *)` – Grijesh Chauhan May 22 '13 at 12:08
  • @LeeDanielCrocker sorry, you're right, I forgot to add the typedef for the union. I also replace b0 and b1 with `unsigned` chars. – anol May 22 '13 at 12:30
  • @GrijeshChauhan: I doubt this is true. As it might appael to many platforms, all that is guaranteed by the standard (from my knowlegde) is `sizeof(void*)==sizeof(char*)`. – alk May 22 '13 at 12:31
  • 1
    @LeeDanielCrocker Motorola 6811 pointers are 16 bit. Also, 8086 'near' pointers. – Crashworks May 22 '13 at 12:32

1 Answers1

2

For pointers, it is better to use type unsigned long (or unsigned long long). Unless there is uintptr_t data type. Why unsigned? Because shift operations are common only for unsigned integers. For signed ones, it is platform-dependent.

So it you want to transfer the address (for whatever reason, as address is usually process-local), you can do like the following:

/**
 * @param ptr Pointer to serialize
 * @param buf Destination buffer
 * @param be  If 0 - little endian, 1 - big endian encoding
 */
void ptr2buf(const void *ptr, void *buf, int be)
{
    uintptr_t u = (uintptr_t)ptr;
    unsigned char *d = buf;

    if (be)
    {
        /* big endian */
        d += sizeof(t) - 1;

        for (i = 0; i < sizeof(t); ++i)
        {
            *d-- = u & 0xFF;
            u >>= 8;
        }
    }
    else
    {
        /* little endian */

        for (i = 0; i < sizeof(t); ++i)
        {
            *d++ = u & 0xFF;
            u >>= 8;
        }
    }
}
Valeri Atamaniouk
  • 5,125
  • 2
  • 16
  • 18
  • Ok, so if my tool tells me all three programs (both my versions and yours) have an undefined behavior, it's my tool that accepts only a subset of C? Does line `*d-- = u & 0xFF` contains undefined behavior? Or just implementation-defined behavior? If so, is there a way to avoid it? – anol May 22 '13 at 14:57
  • This code is correct for all machines. No, the line you point out is 100% correct and compliant. If you have some tool telling you otherwise, it's worthless. – Lee Daniel Crocker May 22 '13 at 19:25
  • Just to clarify what I could grasp from the standard: `u & 0xFF` contains a pointer type (`u`) and a bitwise AND operator. The pointer type is then converted into an integer type, which according to item 6.3.2.3.6 in the C99 standard, "the result is implementation-defined", and then the bitwise operation is performed on the result of the conversion. In other words, I believe there is no way to perform this conversion without relying on at least some implementation-defined behavior, which means your solution (and probably mine) are both standards-compliant. – anol May 24 '13 at 07:42
  • @dhekir `u` is not a pointer type. it's an integer type with sufficient size to hold pointer value. – Valeri Atamaniouk May 24 '13 at 13:02
  • @ValeriAtamaniouk ok, so the implementation-defined conversion from pointer to integer happens at `uintptr_t u = (uintptr_t)ptr`. – anol May 26 '13 at 21:47
  • @dhekir Correct. And if it is using `uintptr_t` type, it is portable as long as target platform supports this type. – Valeri Atamaniouk May 27 '13 at 12:59