2

So I was re-reading C17 6.5/6 - 6.5/7 regarding effective type and strict aliasing, but couldn't figure out how to treat qualifiers. Some things confuse me:

  • I always assumed that qualifiers aren't really relevant for effective type since the rules speak of lvalue access, meaning lvalue conversion that discards qualifiers. But what if the object is a pointer? Qualifiers to the pointed-at data aren't affected by lvalue conversion.

    Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?

  • The exceptions to the strict aliasing rule mention qualifiers in these cases:

    — a qualified version of a type compatible with the effective type of the object,
    — a type that is the signed or unsigned type corresponding to the effective type of the object,
    — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

    None of these address qualifiers of the effective type itself, only by the lvalue used for access. Which should be quite irrelevant, because of lvalue conversion... right?

    Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?

    Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I also got the feeling there's a tonne of defect reports regarding these rules. – Lundin Dec 18 '20 at 12:16
  • For **Q1**, do you mean something like `void volatile *p;` `*(void **)&p = NULL;` would not be permitted, but `void volatile *p;` `*(void volatile * volatile *)&p = NULL;` would be permitted? – Ian Abbott Dec 18 '20 at 13:18
  • **6.7.6.1 Pointer declarators** 2: _For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types_ **6.7.3 Type qualifiers** 11: _For two qualified types to be compatible, both shall have the **identically qualified** version of a compatible type_. So `int*` and `const int*` are not compatible and you can't access an object of one type sru the lvalue of the other type. – Language Lawyer Dec 18 '20 at 13:20
  • @IanAbbott Yes. That is, lvalue access of the (qualified) pointer itself. – Lundin Dec 18 '20 at 13:33
  • @LanguageLawyer But the strict aliasing rules allows for various exceptions. The parts you quote would mostly be the first bullet that I didn't include: "- a type compatible with the effective type of the object". – Lundin Dec 18 '20 at 13:38
  • Even though the types are not compatible, they have the same size, representation, and alignment requirement, which I presume is the motive for the question. – Ian Abbott Dec 18 '20 at 14:08
  • @IanAbbott No, the motive is to determine if a compiler is allowed to go haywire if I lvalue access for example a `const int` through a `*(int*)` or a `const int*` though a `*(int**)`. Or if these are well-defined cases, as far as pointer aliasing is concerned. – Lundin Dec 18 '20 at 14:10
  • Well they appear to be undefined as far as lvalue access is concerned. – Ian Abbott Dec 18 '20 at 14:16
  • Quality implementations should seek to support aliasing constructs in cases that matter to their customers, even if they happen to involve differently-qualified pointers, but be able to optimize in cases where doing so would only affect cases that don't matter to their customers, even if the pointers involved differ only in their qualifiers. I don't think the Committee has ever reached a consensus that all code which mixes differently-qualified pointers is "broken", nor that differently-qualified pointers must always be treated as alias-equivalent, because... – supercat Jan 25 '21 at 19:17
  • ...codifying either viewpoint would forbid some implementations from being maximally useful, and because people seeking to sell compilers should be expected to know more about their customers' needs than the Committee ever could. – supercat Jan 25 '21 at 19:19
  • @supercat Then they should have made large part of the effective type/strict aliasing rules implementation-defined, rather than UB as it stands now. It might make sense to treat effective type qualifiers in an implementation-defined way too: take for example an embedded system with true read-only flash memory. You shouldn't be able to lvalue access a `const type` stored in read-only memory through a non-qualified `type*`, because the physical memory doesn't even support write access. – Lundin Jan 26 '21 at 08:35
  • @Lundin: The authors of the Standard never intended or expected that the phrase "Undefined Behavior" be interpreted as an invitation to behave in gratuitously nonsensical fashion, and thus saw no need to avoid characterizing as UB actions which they expected most or even all implementations to process identically. They did intend to allow implementations to deviate from commonplace corner-case behaviors *in cases that wouldn't adversely affect their customers*, but expected compiler writers to know and respect their customers' needs better than the Committee ever could. – supercat Aug 11 '21 at 22:09
  • @Lundin: It's important to note, btw, that while the Standard pretends to be normative, it has almost no normative authority with respect to freestanding implementations or non-trivial programs for them. There are no non-trivial *Strictly Conforming C Programs* for freestanding implementations, but any blob of text that is accepted--possibly as an extension--by some conforming C implementation somewhere in the universe is a "Conforming C Program". If one factors in the fact that C implementations have carte blanche to extend the language in any way that doesn't affect the behavior of... – supercat Aug 11 '21 at 22:12
  • ...any Strictly Conforming C Program, that effectively means that any random blob of source text that isn't a Conforming C Program could be turned into a Conforming C Program by tweaking an implementation to extend the language by accepting that blob of text with some convenient meaning. – supercat Aug 11 '21 at 22:15
  • Re “1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type?”: Is the second sentence intended to be “Can I lvalue access it as a pointer to the corresponding unqualified type?”? That is, the qualifier is removed from the pointed-to type, not the pointer? – Eric Postpischil Nov 27 '22 at 00:57

2 Answers2

2

"Qualified type" being a defined term, the definition is potentially relevant:

Any type so far mentioned is an unqualified type. Each unqualified type has several qualified versions of its type, corresponding to the combinations of one, two, or all three of the const, volatile, and restrict qualifiers. The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements. A derived type is not qualified by the qualifiers (if any) of the type from which it is derived.

(C17 6.2.5/26)

I note that the _Atomic keyword is different from the other three categorized as type qualifiers, and I presume that this is related to the fact that atomic types are not required to have the same representation or alignment requirements as their corresponding non-atomic types.

I also note that the specification is explicit that qualified and unqualified versions of a type are different types.

With that background,

Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?

I take you to mean this:

const uint32_t *x = &some_uint32;
uint32_t * y = *(uint32_t **) &x;

The effective type of x is const uint32_t * (an unqualified pointer to const-qualified uint32_t), and it is being accessed via an lvalue of type uint32_t * (an unqualified pointer to unqualified uint32_t). This combination is not among the exceptions allowed by the language spec. In particular, uint32_t * is not a qualified version of a const uint32_t *. The resulting behavior is therefore undefined, as specified in C17 6.5, paragraphs 6 and 7.

Although the standard does not discuss this particular application of the SAR, I take it to be justified indirectly. The issue in cases such as this is not so much about accessing the pointer value itself as about producing a pointer whose type discards qualifiers of the pointed-to type.

Note also that the SAR does allow this variation:

const uint32_t *x = &some_uint32;
const uint32_t * const y = *(const uint32_t * const *) &x;

, as const uint32_t * const is a qualified version of const uint32_t *.

Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?

I don't see how lvalue conversion could be construed to apply before strict aliasing. The strict aliasing rule is expressed in terms of the lvalues used for accessing objects, and the result of lvalue conversion is not an lvalue.

Additionally, as @EricPostpischil observed, the SAR applies to all accesses, which include writes. There is no lvalue conversion in the first place for an lvalue that is being written.

Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?

Qualified and unqualified versions of a type are different types. I see no justification for interpreting the paragraph 6.5/6's "the declared type of the object" or "the type of the lvalue" as if the type were supposed to be considered stripped of its qualifiers, much less as if all qualifiers in the type(s) from which it is derived were stripped. The words "the type" mean what they say.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    Regarding lvalue conversion, also note that the rule in 6.5 7 says “An object shall have its stored value accessed…”, and “access” means reading or writing. Thus, it applies to storing a value, as in `x = …`. Then there is no lvalue conversion. So the rules must apply to the lvalue without any conversion. – Eric Postpischil Jan 18 '22 at 23:07
  • Good point, @EricPostpischil. I will add that to this answer. – John Bollinger Jan 19 '22 at 01:24
  • @EricPostpischil: Applying the rules literally would yield a nonsensical language. An automatic-object declaration which includes an initialization expression would modify the storage associated with that object, thus "accessing it", but it's not an lvalue expression and would thus violate N1570 6.5p7. If the rule is interpreted as including an implied "Implementations should ignore this paragraph in cases where doing so would yield a defined behavior whose usefulness would outweigh any benefits from applying this section aggressively," however, many questions about the Standard would be moot. – supercat Jan 21 '22 at 23:58
  • @supercat, I do not claim that the language specification is wholly self-consistent, but it *is* meant to be taken literally, especially when it comes to its own defined terminology. The wording is pretty carefully chosen with exactly that in mind. In this particular case, "access" is defined as "to read or modify the value of an object" (C17 3.1/1), whereas "An initializer specifies the initial value stored in an object" (C17 6.7.9/8). In light of the former, the latter is not in conflict with 6.5/7 because specifying the *initial* value of an object cannot *modify* the value of an object. – John Bollinger Jan 22 '22 at 15:53
  • @JohnBollinger: There are many circumstances where the Standard specifies that the lifetime of an object begins before the evaluation of its "initial" value. Further, the Standard would allow code to branch to a location before an object's initialization and then pass through the initializing declaration again. Would it make any sense to say that that doesn't modify the value, or that such constructs invoke UB because they modify a value without using an lvalue? A lot of confusion about the Standard comes from the fact that if there's no consensus on the Committee about when a construct... – supercat Jan 22 '22 at 19:56
  • ...should have defined behavior, the Committee defaults to categorizing the construct as UB; such classification doesn't imply that there was anything resembling a consensus that programmers should be forbidden from using the construct, but unfortunately the Standard makes no distinction between constructs which there would have been a consensus to prohibit, versus those where the Committee intended to let implementations decide whether processing the constructs meaningfully would benefit their customers. – supercat Jan 22 '22 at 19:59
  • Both clang and gcc will recognize that `someUnion.array1[i]` may access the same storage as `someUnion.array2[j]`, but neither will recognize such interaction if the expressions are written as `*(someUnion.array1+i)` and `*(someUnion.array2+j)`. This would make sense if one interprets the Standard as saying that all of those forms invoke UB, but then recognizes that only a really horribly obtuse implementation would fail to meaningfully process the forms written with array syntax, but I see no "literal interpretation" of the Standard that would justify such distinction. – supercat Jan 22 '22 at 20:03
0

Q3: Does the effective type come with qualifiers or not? Where in the standard is this stated?

The effective type includes qualifiers (or lack thereof) because the rules about effective type say that a type is used, and types include qualifiers, and the rules about effective type do not say the qualifiers are disregarded.

C 2018 6.5 6 says the effective type of an object for access to its stored value is one of:

  • “the declared type of the object” (if any),
  • “the type of the lvalue” previously used to store into it (if that is not a character type),
  • “the effective type of the object from which the value is copied” (if it was copied by a byte-copy method and the source has an effective type), or
  • “the type of the lvalue used for the access.”

The third of these is recursive, so it leads to one of the others. The others all say the effective type is some type, and they do not say the effective type is the unqualified version of that type. It simply is that type; the qualifiers are not removed.

Q2: Does lvalue conversion happen before or after the above quoted rules of effective type/strict aliasing are applied?

Lvalue conversion is immaterial. The aliasing rules in C 2018 6.5 7 make no mention of lvalue conversion, and it might not occur at all, since the rules apply to both reading and modifying values. (The rules in 6.5 7 are for when a stored value is “accessed,” and “access” in the C standard means reading or modifying, per 3.1.) When an object is modified, a new value is written into it; there is no lvalue conversion. When an object is read, the aliasing rules apply to that access, and lvalue conversion happens afterward, as a separate thing.

Q1: What if the effective type is a pointer to qualified-type? Can I lvalue access it as a non-qualified pointer to the same type? Where in the standard is this stated?

The phrasing of these sentences do not make sense in this context. I will consider two meanings for them.

First, I take the first sentence as it stands and the second question as “Can I lvalue access it as a pointer to the unqualified version of the effective type?” Although I suspect my second interpretation below is the one that was intended, this one involves less change to the text. The answer is the C standard does not define the behavior because it does not conform to the rule in 6.5 7.

Given const char *p;, p is a pointer to a qualified type. Then, after, char **q = (char **) &p;, *q is a pointer to an unqualified type. Using *q to read or to modify p would not conform to the rule in 6.5 7. When we consider accessing p with *q, then as we see above, the effective type of the object is const char *, the type of the lvalue is char *, and none of the cases in 6.5 7 say a const char * may be accessed as a char *.

Second, I take the sentences as “What if the effective type is a qualified type? Can I lvalue access it as an unqualified version of the same type?” Again, the answer is the C standard does not define the behavior because it does not conform to the rule in 6.5 7.

Given const int p = 3;, p has a qualified type. Then, after int *q = (int *) &p;, *q has the unqualified version of the same type. When we consider accessing p with *q, the effective type of the object is const int, and the type of the lvalue is int, and none of the cases in 6.5 7 say a const int may be accessed as an int.

None of these address qualifiers of the effective type itself, only by the lvalue used for access. Which should be quite irrelevant, because of lvalue conversion... right?

No, the qualifiers of the effective type are relevant. lvalue conversion, if it occurs, does not make them irrelevant. 6.5 7 states requirements for the lvalue type with relation to the effective type, and the qualifiers of each are parts of their types and partake in the rule in 6.5 7.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312