0

Consider the following code:

*((unsigned int*)((unsigned char *)0x400000));

Does it violate strict aliasing?

From the strict aliasing rule:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

...

For a violation to occur, there must be an object of type unsigned char, so when accessed with unsigned int* a violation will occur.

Now, does (unsigned char *)0x400000 constitute an unsigned char object at address 0x400000? if not, than there is actually no object with stored value of type unsigned char here, so the access to it with a unsigned int does not violate the rule.

Note the following phrase about object:

Object

region of data storage in the execution environment, the contents of which can represent values

NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

So, since (unsigned char *)0x400000 does not constitute an unsigned char object reference (to my understanding) there is no object of type unsigned char in the presented code, so it seems that there is no violation.

Am I correct?


With respect to @Antti Haapala answer:

If we assume that integer to pointer conversion of both converting 0x400000 to unsigned char* and to unsigned int* has on my system a defined behavior with no trap representation and well aligned, and that is in order to fill the implementation-defined gap from the standard:

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation

Will that change the answer to my question?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
izac89
  • 3,790
  • 7
  • 30
  • 46
  • Let understand better, you say that the address `0x400000` contains a `char` value (`(unsigned char *)0x400000`). Then you cast that pointer to an `unsigned int` pointer, `(unsigned int*)`. Then you dereference the pointer and read the `unsigned int`. `char` and `unsigned int` have same size? – Frankie_C Sep 10 '19 at 07:54
  • `char` is 1 byte, `unsigned int` is 4 bytes – izac89 Sep 10 '19 at 07:56
  • The point, IMO, is that the standard talks of strict aliasing violation when an object, already defined, is accessed with a different lvalue expression. And also allows access with `unsigned char` because being a `char` the smallest object there is a guarantee to not address outside object size, and unsignedness prevents sign transformations. But in your case the object seems not defined elsewhere, so the simple expression `*((unsigned int*)0x400000)` access and design the object in the same time. – Frankie_C Sep 10 '19 at 08:06

3 Answers3

1

Essentially, strict aliasing isn't applicable when dealing with hardware registers, since the committee apparently never considered hardware-related programming scenarios or memory-mapped registers.

Strict aliasing only applies in scenarios where the compiler is able to determine an effective type of what's actually stored in memory. If you take a hardware address and lvalue access it, the contents there can never have an effective type that the compiler is aware of. This means that when reading the data, I suppose this part from 6.5 would apply:

For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

In your case unsigned int. Please note that the middle cast to unsigned char* is completely superfluous both in terms of strict aliasing and in terms of lvalue access to the hardware. It should be removed.

If you do a write access however, the compiler is free to treat the value at that address as the effective type used through the write access:

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

Basically the rules of effective type and strict aliasing don't make sense when dealing with hardware addresses, so the compiler can't do much with them. In addition, such accesses often don't make any sense unless the pointer is volatile qualified, preventing any attempts of optimization based on strict aliasing.

Just to be sure, always compile with -fno-strict-aliasing when doing any form of embedded systems programming.

Lundin
  • 195,001
  • 40
  • 254
  • 396
0

Undecided. The standard does not even tell what happens if you convert an integer to a pointer, except that the result is implementation-defined with several possible behaviours that might be undefined...

What is certain though is that *(unsigned int*)(unsigned char *)0x400000; and *(unsigned int*)0x400000 do not differ in undefinedness, as merely pointing to an object using a certain value does not change its effective type.

Another question is what is the type of the object at 0x400000 and what is the value - if you never set it, the contents are indeterminate and it might not have an effective type. Reading it will have undefined behaviour.

What is certain though is that if you write an object of type float at that address and considering that it is a valid access, the effective type of the object at address 0x400000 will be float. If you then read the object at that address with an lvalue of type unsigned int it will be a strict aliasing violation.

The standard does not take stances on any of these. You're on your own. Check your compiler manuals. As soon as you convert an integer to a pointer and start poking at memory there you're not strictly conforming and you don't find any backing in the standard, period.

  • converting integer to pointer is implementation defined `An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation` – izac89 Sep 10 '19 at 07:58
  • @user2162550 yes, and if it is a trap representation or not correctly aligned or so on then the behaviour is undefined – Antti Haapala -- Слава Україні Sep 10 '19 at 08:02
  • `Another question is what is the type of the object at 0x400000 and what is the value - if you never set it, the contents are indeterminate and it might not have an effective type. Reading it will have undefined behaviour.` Is that mean that a code that reads read-only registers which have default reset-values `volatile unsigned int* reg= (volatile unsigned int*)0x1234; printf("%u",*reg);` constitutes an undefined behavior? – izac89 Sep 10 '19 at 08:21
  • @user2162550 yes. It means that. The C standard does not require the compiler to have any specific behaviour. It becomes a "quality of implementation" issue – Antti Haapala -- Слава Україні Sep 10 '19 at 08:22
  • Wow, interesting. – izac89 Sep 10 '19 at 08:23
  • but still, `volatile` does not change the `automatic` storage duration of the variable, hence if it's not initialized than still there is UB reading it, as you say – izac89 Sep 10 '19 at 08:28
  • it wouldn't be "automatic storage"... but it would be undefined for all the other reasons, or *at best* indeterminate :D but passing an indeterminate value without trap representations to a library function has undefined behaviour. – Antti Haapala -- Слава Україні Sep 10 '19 at 08:36
  • There's no UB. Converting to a pointer type has implementation-defined aspects as per the quoted part in 6.3. Assuming that the programmer know what they are doing and the underlying system allows it, such conversions are fine. There is as far as I know nothing in the standard stating anything about pointed-at value, nothing that labels the value as indeterminate etc. It's simply beyond the scope of the standard. – Lundin Sep 10 '19 at 08:41
  • @Lundin nope, "beyond the scope of the standard" is undefined behaviour, by omission, unless it falls under implementation-specified, unspecified or locale-specific behaviours... Yes, all of your embedded programs have undefined behaviour when it comes to *standards-compliance* but "it is fine". – Antti Haapala -- Слава Україні Sep 10 '19 at 08:43
  • But using the term UB like that is silly, because then it would also be UB to run C on a computer, since the standard never mentions computers. We have to use the term for violations of the standard of cases that are explicitly UB, otherwise the whole language becomes useless in the real world. – Lundin Sep 10 '19 at 08:47
  • @Lundin incorrect again. I am reading what the standard says. You're thinking that "UB" means "bad things". It is just ... **undefined behaviour**. Use a wrong specifier for `printf` - undefined behaviour but it can be defined elsewhere. – Antti Haapala -- Слава Україні Sep 10 '19 at 08:49
  • The case is: *If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ''behavior that is undefined''.* – Antti Haapala -- Слава Україні Sep 10 '19 at 08:50
  • So you're really arguing with the standards committee here, I am just reading the standard as it is. Unless OP adds a certain implementation whose manuals we can check in the question then the standard is all we've got. – Antti Haapala -- Слава Україні Sep 10 '19 at 08:51
  • The context here is if reading a hardware register will produce _unexpected_ behavior such as the compiler going bananas with strict aliasing or other optimizations. To then say that "yeah it is UB" and compare it with misaligned access or other such real-world bugs isn't helpful. – Lundin Sep 10 '19 at 08:58
  • @Lundin but it **is** UB. Same kind of bananas allowed. – Antti Haapala -- Слава Україні Sep 10 '19 at 08:59
  • As the standard says: **There is no difference in emphasis among these three**. – Antti Haapala -- Слава Україні Sep 10 '19 at 09:05
  • @AnttiHaapala: The situations which seem to cause controversy are those where parts of the Standard or an implementation's documentation would together describe the behavior of some action, but another part would characterize it as invoking Undefined Behavior. I believe the statement you wrote in bold above is meant to suggest that the Committee *has made no attempt to judge* which should be given precedence in such situations, but instead rely upon implementations intended for various purposes to uphold the Spirit of C (see the Rationale) as appropriate for those purposes. – supercat Sep 12 '19 at 16:16
0

The Standard only tries to define the behavior of portable programs. It consequently leaves questions about whether or how to support non-portable programs to the judgment of compiler writers, who were expected to know more about their customers' needs than the Committee ever could.

There are no circumstances in which the Standard would require implementations to guarantee anything meaningful about the effects of casting an integer of unknowable provenance to a pointer and then using that pointer to access storage. The ability to meaningfully process such code would be a Quality of Implementation issue regardless of the types used for access. If the pointers are not qualified volatile, operations involving them may get optimized out regardless of the types involved; if they are volatile, operations will be sequenced with respect to other volatile accesses, again regardless of the types.

While some compilers may allow for the possibility of interaction between a volatile object and a non-qualified object of the same type, without doing so for other interactions between volatile objects and unqualified ones, compilers that seek maximum compatibility with low-level code will allow for interaction between volatile accesses and all objects of all types, while those that don't may not always accommodate interactions between objects they regard as unrelated, even if they happen to have the same type and same address.

supercat
  • 77,689
  • 9
  • 166
  • 211