12

In the C89 standard, I found the following section:

3.2.2.1 Lvalues and function designators

Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.

If I read it correctly, it allows us to create an lvalue and applies some operators on it, which compiles and can cause undefined behavior during runtime.

Problem is that, I can't think of an example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.

Consider that an lvalue is

An lvalue is an expression (with an object type or an incomplete type other than void) that designates an object.

and that incomplete type is

Types are partitioned into object types (types that describe objects), function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).

A failed program I tried:

struct i_am_incomplete;
int main(void)
{
    struct i_am_incomplete *p;
    *(p + 1);
    return 0;
}

and got the following error:

error: arithmetic on a pointer to an incomplete type 'struct i_am_incomplete'
    *(p + 1);
      ~ ^

Anyone can think of an example on this ? An example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.


UPDATE:

As @algrid said in the answer, I misunderstood undefined behavior, which contains compile error as an option.

Maybe I'm splitting hairs, I still wonder the underlying motivation here to prefer undefined behavior over disallowing an lvalue to have an incomplete type.

Community
  • 1
  • 1
lazyplayer
  • 373
  • 1
  • 8
  • Example of what? You've already provided an example. Unclear what you're asking for. NB Your question is mis-worded. It is *syntactically* possible but *semantically* impossible. If it was syntactically impossible you would have got a syntax error. – user207421 Sep 04 '17 at 02:10
  • 1
    to do any pointer arithmetic, the compiler has to know the size of the underlying data. On the other hand to keep a pointer as a variable does not requires knowledge about the data. So, you can pretty much assign pointers to incomplete data types, but nothing else. – Serge Sep 04 '17 at 02:16
  • @EJP You're right, it should be a semantic thing. I'm fixing it. Then, if it's semantically impossible, what's the intent of this rule? – lazyplayer Sep 04 '17 at 02:16
  • @EJP an example of "an lvalue with incomplete type" which can pass compiler's semantic check. Or the intent of rule "If the lvalue has an incomplete type and does not have array type, the behavior is undefined." – lazyplayer Sep 04 '17 at 02:21
  • How about in Windows, they have HANDLE, HBITMAP, HBRUSH. You do not really know what they are, they're just pointers to something. They're implementation is kept private from you. As far as you're concern, they're effectively an incomplete type right? But, you don't need to know what they are as long as you pass them back to Windows through their API calls. – Stephen Quan Sep 04 '17 at 02:24
  • @StephenQuan Nice explanation. :) Yes, that's what pointer is good at. My concern is that can we create a program that triggers the "undefined behavior"? If we can't, I don't understand why C89 includes the rule. – lazyplayer Sep 04 '17 at 02:30
  • @Serge True. It's hard to think of an example that compiles. – lazyplayer Sep 04 '17 at 02:53
  • 1
    I think OP's intended question is: "Is it possible to have a program with no constraint violations which performs lvalue conversion on an lvalue of incomplete type?" – M.M Sep 04 '17 at 03:23
  • @M.M You got it! and also the intent of the UB rule. – lazyplayer Sep 04 '17 at 03:30

4 Answers4

6

I believe this program demonstrates the case:

struct S;
struct S *s, *f();

int main(void)
{
    s = f();
    if ( 0 )
        *s;   // here
}

struct S { int x; };
struct S *f() { static struct S y; return &y; }

On the marked line, *s is an lvalue of incomplete type, and it does not fall under any of the "Except..." cases in your quote of 3.2.2.1 (which is 6.3.2.1/2 in the current standard). Therefore it is undefined behaviour.

I tried my program in gcc and clang and they both rejected it with the error that a pointer to incomplete type cannot be dereferenced; but I cannot find anywhere in the Standard which would make that a constraint violation, so I believe the compilers are incorrect to reject the program. Or possibly the standard is defective by omitting such a constraint, which would make sense.

(Since the code is inside an if(0), that means the compiler cannot reject it merely on the basis of it being undefined behaviour).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Though I tend to believe that C standard should have prohibited "dereferencing a pointer to incomplete type", I didn't find it. Maybe you are right. The standard is defective by omitting this rule. Also in this case, the UB rule seems redundant. – lazyplayer Sep 04 '17 at 05:04
  • Why do you think "fails to compile" isn't a valid kind of undefined behaviour? – user253751 Sep 04 '17 at 05:45
  • 3
    @immibis the compiler must translate the program unless it has UB in all possible code paths. If it fails to translate this program it's non-conforming because the line `*s;` cannot be reached (and the rules of the language specify that the rest of the program must work) – M.M Sep 04 '17 at 07:06
  • @M.M: I guess I would hope that the standard calls this "ill-formed" and not merely UB. I'm really surprised if that's not the case, I should study the standard more. – Chris Beck Sep 05 '17 at 22:09
  • 2
    @ChrisBeck "ill-formed" is a C++ thing. In C there are *constraint violations* which mean the compiler must produce a diagnostic and may refuse to translate the program. There is also undefined behaviour that doesn't violate constraints (e.g. `int x = 1 / argc;` when the program was invoked with `argc == 0`). Currently `*s;` is not a constraint violation although I would argue that it probably should be. – M.M Sep 05 '17 at 22:18
  • @M.M. I realize the question is about C, but I think in C++ at least I'm not sure that the code should be invalid. Because, if instead we were binding a reference to `*s`, I think the standard explicitly supports that. I.e. `if (0) { S & r = *s; }` should be well-formed. So it would be weird if just writing `*s` alone is ill-formed. I think it hinges on whether `*s` alone invokes lvalue to rvalue conversion -- I'm not sure whether it does or doesn't. If C were similar then I guess it might allow `S * r = &*s;` as valid even if S is incomplete, so maybe it allows `*s` also for this reason? – Chris Beck Sep 06 '17 at 00:13
  • @ChrisBeck As far as I'm concerned, the syntax of binding a reference in C++ is just a way to avoid a reference pointing to invalid addresses, like nullptr. So, it's still a pointer-like thing. There is no value fetching here. But `*s;` has. – lazyplayer Sep 06 '17 at 06:36
  • _the compiler must translate the program unless it has UB in all possible code paths_ Compare with a sentence from the previous paragraph «if an lvalue does not designate an object **when it is evaluated**, the behavior is undefined». «If the lvalue has an incomplete type and does not have array type, the behavior is undefined» doesn't say «when evaluated», so it is enough to just have an lvalue expression with an incomplete type in a program to «statically» trigger UB. – Language Lawyer Feb 15 '22 at 16:20
  • @lazyplayer Re: "I tend to believe that C standard should have prohibited "dereferencing a pointer to incomplete type", I didn't find it": Same here. I guess that one reason is to support `&*x` (where `x` has incomplete type). See relevant questions: [1](https://stackoverflow.com/q/71022028/1778275), [2](https://stackoverflow.com/q/71129627/1778275). – pmor Feb 15 '22 at 23:45
  • @LanguageLawyer at the time I considered that "when it is evaluated" was implied, but maybe you are right – M.M Feb 15 '22 at 23:46
  • @M.M Re: "I believe the compilers are incorrect to reject the program": me [too](https://stackoverflow.com/q/71129627/1778275). I guess that such a constraint is omitted to support `&*x`. – pmor Feb 15 '22 at 23:52
2

Some build systems may have been designed in a way would allow code like:

extern struct foo x;
extern use_foo(struct foo x); // Pass by value

...
use_foo(x);

to be processed successfully without the compiler having to know or care about the actual representation of struct foo [for example, some systems may process pass-by-value by having the caller pass the address of an object and requiring the called function to make a copy if it's going to modify it].

Such a facility may be useful on systems that could support it, and I don't think the authors of the Standard wanted to imply that code which used that feature was "broken", but they also didn't want to mandate that all C implementations support such a feature. Making the behavior undefined would allow implementations to support it when practical, without requiring that they do so.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • I think you are right about the `= y` part; however `x =` is a constraint violation (even in C89) because the assignment operator has a constraint that the left hand side must be a modifiable lvalue, and the definition of "modifiable lvalue" excludes lvalues of incomplete type. – M.M Sep 05 '17 at 22:10
  • I don't see any way to use the `= y` idea in a program that doesn't contain a constraint violation; since the assignment operator constraints include that if the right hand side has struct type then the left must have compatible struct type – M.M Sep 05 '17 at 22:12
  • @M.M: What about passing an incomplete structure type by value? Would that violate any constraints? Also, there are a number of places where parts of the Standard make provision for something but other parts make it impossible. The authors of the Standard have not tried to fix the countless defects which would be of interest primarily to pedants and don't appreciably impair the Standard's practical usefulness. – supercat Sep 05 '17 at 22:20
  • I think you're right there: `extern struct foo x; void f(); int main() { f(x); }` would not violate any constraints that I can see. – M.M Sep 05 '17 at 22:30
  • @M.M: More significantly, `extern struct foo x; void f(struct foo); int main(int argc, char **argv) { if (argc==3) f(x); }` should have well-defined behavior if a suitable declaration for `f()` exists somewhere and the program is invoked with argc!=3. – supercat Sep 05 '17 at 23:30
  • To clarify , are you saying you agree with the standard in that the code in your last comment should not be a constraint violation, and you also agree that it should be (runtime) undefined behaviour if `argc == 3` happens? – M.M Sep 05 '17 at 23:45
  • @M.M: Yup. The code should have defined behavior in the `argc!=3` case, though I'm not aware of any compilers that would process that. If a compiler somehow knew that `argc` would equal 3, it could refuse to process the program (it wouldn't have to wait until "run-time" to fail) but a compiler would generally have no way of knowing such things. – supercat Sep 05 '17 at 23:51
  • OK. This is probably a better answer than mine in that it's more likely to have occurred in practice – M.M Sep 06 '17 at 01:35
1

"Undefined behavior" term includes compilation error as an option. From the C89 standard:

Undefined behavior - behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

As you can see "terminating a translation" is ok.

In this case I believe the compilation error you get for you sample code is an example of "undefined behavior" implemented as compile time error.

algrid
  • 5,600
  • 3
  • 34
  • 37
  • Aha! I see, so "undefined behavior" is not just a runtime indetermination. It basically means "let the compilers do whatever they want". It's confusing under such context. – lazyplayer Sep 04 '17 at 05:20
  • Translation cannot be terminated unless it can be proven that execution would reach the part with undefined behaviour – M.M Sep 05 '17 at 22:05
  • @M.M Why do you think so? It's quite hard to prove such thing in general case. Any reference to the standard? – algrid Sep 05 '17 at 22:10
  • It's impossible to prove in the general case, which is why the whole concept of "undefined behaviour" exists in the first place. By your interpretation the compiler could refuse to compile the code `if ( p == NULL ) { bar(); } else { p->foo(); }` because `p->foo()` is undefined behaviour when `p == NULL`. The only self-consistent way to apply the UB concept is if it is triggered by the flow of execution reaching an expression that causes UB. (But if the flow does reach such an expression the effect can "time travel" back). – M.M Sep 05 '17 at 22:26
  • I think the intention of the Standard may have been to allow a compiler to refuse execution of a program that contains a function call which passes an argument of incomplete type, but I'm not sure a conforming compiler could do so in cases where the function call isn't actually executed. – supercat Sep 05 '17 at 23:24
  • @M.M If a compiler refuses to compile `if ( p == NULL ) { bar(); } else { p->foo(); }` then it refuses to implement the `->` semantics as required by the standard. But for `p = NULL; p->foo();` I can image a compiler that will throw a compile time error because it won't break any standard requirement - it's not possible for p to point at any struct or union (I don't consider now the possibility of changing p from another thread). – algrid Sep 05 '17 at 23:30
  • The "semantics required by the standard" is also that in the case `if ( 0 ) { 1 / 0; }` then the `if` body is never entered – M.M Sep 05 '17 at 23:41
  • @algrid: If a program contains `void test(void) { struct THING *p=0; p->foo();}` but the program would only call `test` for certain inputs,. the existence of the `test()` function above shouldn't affect behavior except when it's actually called. – supercat Sep 06 '17 at 14:24
  • @supercat Do you think that a compiler refusing to compile this code won't be compliant with C89? – algrid Sep 06 '17 at 14:39
  • @algrid: Under the "One Program Rule", a conforming compiler is only required to be capable of processing one (possibly contrived and useless) program that tests each Translation Limit. A conforming implementation could, under that rule, choke on any almost program which contains more than two tokens in an expression and doesn't precisely match a contrived exemplar program. Almost anything an implementation might do would be justifiable under that provision. On the other hand, I don't think there would be any other way of justifying refusal to process a program... – supercat Sep 06 '17 at 14:58
  • ...that contains UB in a context that does not require a constant expression and is never executed. – supercat Sep 06 '17 at 14:59
  • @supercat Why do you think that Translation limits define a complete set of requirements for an implementation? The standard defines other requirements too. Take a look at the 1.7 COMPLIANCE section. – algrid Sep 06 '17 at 15:14
  • @algrid: The published rationale recognizes that some implementations operate under severe resource constraints and be unable to handle e.g. the maximum number of macros, all of which have the maximum length, etc. Rather than try to come up with some formulae saying that it must accommodate X macros of length Y and P macros of length Q, etc. the authors of the Standard basically punted and figured that anyone trying to write a useful implementation would do so in a way that can process programs besides the One Program, but regarded the ability to do so as a Quality of Implementation issue. – supercat Sep 06 '17 at 15:23
  • @algrid: From a practical matter, the range of tasks that could be performed on all platforms where C could be useful is extremely limited. I thus don't see much value in trying to define a class of programs that all conforming implementations must be able to translate. A bigger issue is that the Standard allows arbitrary behavior on programs that exceed possibly-unspecified limits. That IMHO undermines the Standard far more than allowing implementations to arbitrarily reject programs (with notice of rejection given in Implementation-Defined fashion) – supercat Sep 06 '17 at 15:38
1

Sure, array types can be that:

extern double A[];
...
A[0] = 1;           // lvalue conversion of A

This has well defined behavior, even if the definition of A is not visible to the compiler. So inside this TU the array type is never completed.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • It doesn't show on the title, but what I want to ask is an lvalue that "has an incomplete type and does not have array type", which, according to the standard, leads to undefined behavior. – lazyplayer Sep 04 '17 at 17:38
  • Array is a nice example of "create an lvalue with incomplete type". – lazyplayer Sep 04 '17 at 17:54