3

In my C++ learning process I've come across code like this in x86 which I don't understand:

unsigned long value = 50;
unsigned char* result = (unsigned char*)value;

Inspecting the pointer gives a memory access error (i.e. I can't write std::cout << *result;). If anything the cast line should be like:

unsigned char* result = (unsigned char*)&value;

But it isn't. So my question is, under what circumstances (if any) is the first cast usable? Or put in another way, is it possible to get the data from the pointer?

drescherjm
  • 10,365
  • 5
  • 44
  • 64
Werner
  • 1,229
  • 1
  • 10
  • 24
  • 3
    In general, no, there is no valid reason to do something like that. Code that does something like that will be platform specific, and is still highly suspect. – Eljay Sep 20 '19 at 12:07
  • 2
    where did you see the code? Without context it is hard to say if this can make sense (probably it doesnt) – 463035818_is_not_an_ai Sep 20 '19 at 12:07
  • If the size of a pointer is the same as the size of an `unsigned long` (this is usually the case on 32 bit platforms) it should work, but why do you need to do that? Where did you see this code? – Jabberwocky Sep 20 '19 at 12:08
  • 2
    Suggested lecture: https://stackoverflow.com/questions/10368777/storing-a-pointers-address-in-an-unsigned-int-in-c – Amadeus Sep 20 '19 at 12:10
  • 1
    Define "valid". Your first sample of code will compile, with no diagnostic required, but any subsequent code that dereferences `result` will have undefined behaviour. – Peter Sep 20 '19 at 12:10
  • @Jabberwocky being x86 they are both 4 bytes. How can I test that it should work? Using VS 2017 and running the above code, I cannot access the pointer without memory access error... – Werner Sep 20 '19 at 12:10
  • 3
    @Werner Just that it is valid in principle does not mean the code as given should work. Dereferencing `result` would attempt to access the virtual memory location 0x00000032, which is almost surely not allowed. But taking the address of a valid C++ object, casting it to `std::uint32_t` and back into a pointer of the correct type should be fine on x86. – Max Langhof Sep 20 '19 at 12:17
  • What is it that you wish to accomplish with such a cast? What do you mean by "get the data from the pointer"? – Lightness Races in Orbit Sep 20 '19 at 13:11
  • @MaxLanghof would it fail on x64? – Werner Sep 27 '19 at 09:25
  • @LightnessRacesinOrbit not sure, hence my question. It is code in an old application that passes the unsigned char* to an API. Now this API has been updated to x64 only (still accepting an unsigned char*) and I' to convert our API usage to x64. And stumbling over this code made me stop and wonder... – Werner Sep 27 '19 at 09:27
  • @Werner It would be undefined behavior on x64. Cast to `std::uintptr_t` to be safe. – Max Langhof Sep 27 '19 at 09:31
  • @Werner Without context it's impossible to know what it's _supposed_ to do, and if you don't know what you mean by your question then we can't answer it :P – Lightness Races in Orbit Sep 27 '19 at 09:43

4 Answers4

4

I am currently working on some digital signal processing, which involves programming an Analog Devices DSP. Here is a small snippet from one of the .h files that they ship with the board:

#define pUART0THR                ((volatile unsigned int *)0x3c00)    /* Transmit Holding Register */
#define pUART0RBR                ((volatile unsigned int *)0x3c00)    /* Receive Buffer Register */
#define pUART0DLL                ((volatile unsigned int *)0x3c00)    /* Divisor Latch Low Byte */
#define pUART0IER                ((volatile unsigned int *)0x3c01)    /* Interrupt Enable Register */

I bet you would be able to find similar stuff if you dive into the headers for just about any other embedded system.

In this case 0x3c00 is a memory address that you can write a byte to, and have a hardware UART module read it and transmit it over the serial port. And 0x3c01 is the address of a register, where if you set the second bit (bit 1), then you will get a hardware interrupt when the UART send buffer (at address 0x3c00) is empty = the byte you put there has been send.

However, so that you don't have to remember all these addresses, and get some nice (or at least more memorable) names to call them by, they have a bunch of defines like the ones above.

Frodyne
  • 3,547
  • 6
  • 16
  • 1
    @formerlyknownas_463035818 This is driver level code, in some sense you don't need to create an object at those addresses because it already exists (in the form of hardware physically wired to those memory addresses). I will not argue that it is pretty or safe (it is absolutely neither of those), but at some level you need this kind of code to access hardware. – Frodyne Sep 20 '19 at 12:35
  • @formerlyknownas_463035818 presumably the toolchain that ships that header promises there really are `volatile unsigned int`s at those addresses – Caleth Sep 20 '19 at 12:38
  • @Caleth Yup, I also have a 1304 page "Processor Hardware Reference" that explains what every single bit in those registers are for. – Frodyne Sep 20 '19 at 12:41
  • sorry deleted the comment, because i felt it is going towards an extended discussion when actually your answer explains quite well the context where this is used – 463035818_is_not_an_ai Sep 20 '19 at 13:26
3

There are some cases, but often it is not appropriate.

The conversion is valid only if:

  • long is sufficiently large type to represent all addresses. This is not guaranteed. In fact, long is not sufficiently large on 64 bit Windows systems. There is an integer type that is guaranteed to be correctly sized for this purpose: std::uintptr_t.

Of course, this conversion is completely pointless unless you intend to do something with the pointer. Whether those things are valid depend on yet more requirements.

  • If the converted pointer value is invalid i.e. does not refer to an allocated area of storage, then indirecting through the pointer has undefined behaviour. Any other use (such as comparing the value to another pointer) has implementation defined behaviour.
  • If the pointer is to an object outside of its lifetime, but with allocated storage, then the pointer can only be used in limited way. Accessing memory through the pointer is still typically (exceptions apply) undefined behaviour, but you can for example compare the values equality with another valid pointer. Refer to standard sections [basic.life], [class.base.init] and [class.cdtor] for details on what is allowed.
  • If the pointer is to an object within its lifetime, and the pointer is of same type (or compatible type), then you can indirect through the pointer to access the pointed object.

Note that the mapping of the conversion from integer to pointer is implementation defined. Pretty much only guarantee you have is that a pointer that is converted into an integer of sufficient size can be converted back to the same value.


The above describes only when the conversion and use of the converted pointer is valid. Another matter is whether there is any point in doing this. The use cases are rare on desktop / server systems (see andrey's answer for an example), but on embedded systems, there are sometimes hard coded addresses for communicating with the hardware (see Frodyne's answer for an example).

eerorika
  • 232,697
  • 12
  • 197
  • 326
3

One potential reason to cast an integer to a pointer would be to transport that integer value through an API that accepts a pointer.

For example, many callback-related APIs accept a void *user_data parameter to be passed as an argument to the callback function. Consider the following function:

void do_async_thing(args..., void (* callback) (int result, void *user_data), void *user_data);

If all you need to pass to the callback is an integer, then instead of passing a pointer to an allocated block of data, you might write:

do_async_thing(
    args...,
    [](int result, void *request_idx) {
        std::cout << "request " << reinterpret_cast<std::uintptr_t>(request_idx) << " returned " << result;
    },
    reinterpret_cast<void *>(std::uintptr_t{request_idx})
);

Of course you should use std::uintptr_t, not unsigned long (e.g. on Windows unsigned long is 32 bits).


Another reason would be if on your specific platform you happen to know that there is a valid object at that address (e.g. on some embedded device).

Ultimately this is all implementation-defined. std::uintptr_t may not even exist on a specific platform.

andrey
  • 463
  • 2
  • 8
  • This is exactly the case - I have an API that accepts a pointer, say unsigned char*, but I fail to understand how that API can use it - if I cannot...the code (both mine and the API) is running on Windows btw. – Werner Sep 27 '19 at 09:14
  • @Werner Assuming you are talking about my first option (using a pointer as transport for an integer), the async function would never dereference the pointer, simply storing it until it needs to be passed to the callback. Then the callback can interpret it as it wants. The pointer is really just a a fancy integer under the hood. Only *dereferencing* an invalid pointer would cause undefined behaviour. – andrey Sep 29 '19 at 18:17
2
unsigned long value = 50;
unsigned char* result = (unsigned char*)value;  

now the pointer result contains the number 50, in other words it points to the address 50. But on modern platforms you cannot simply address any memory address.

Therefore this code will result in undefined behaviour (usually a crash).

std::cout << *result;
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • no idea. Anyhow, there is one exception though. And this is if somewhere in the code before a `unsigned char` object was created at just that memory adress – 463035818_is_not_an_ai Sep 20 '19 at 12:18
  • @formerlyknownas_463035818 yes, but hence "undefined behaviour". – Jabberwocky Sep 20 '19 at 12:21
  • 1
    *Technically* the mapping from integer to pointer is implementation defined so the address is not required to be 50 in this case. But that's very likely, and the intention of the language. – eerorika Sep 20 '19 at 12:35
  • On **modern** embedded systems you access hardware registers by assigning values to pointers, then dereferencing the pointers. – Thomas Matthews Sep 20 '19 at 14:33