Undefined behavior when working with partially initialized struct in C90

Question

Let's consider the following code:

struct M {
  unsigned char a;
  unsigned char b;
};

void pass_by_value(struct M);

int main() {
  struct M m;
  m.a = 0;
  pass_by_value(m);
  return 0;
}

In the function pass_by_value m.b is initialized before used. However, since m is passed by value the compiler copies it to the stack already. No variable has storage class register here. a and b are of type unsigned char.

Does that have to be considered UB in C90? (Please note: I am specifically asking for C90)

This question is very similar to Returning a local partially initialized struct from a function and undefined behavior, but actually the other way around.

It seems to me that the other question also answers this question -- copying a struct by value . Whether the copy is in the function parameter or its return value is immaterial — M.M, Jul 14 '21 at 05:57
OT: Since you have tagged this with "language-lawyer" it is worth to mention that - quote: *"However, since m is passed by value the compiler copies it to the stack already."* **is wrong**. You can not make no such assumption. The C standard doesn't even require an implementation to have a stack. — Support Ukraine, Jul 14 '21 at 06:27
If you are working with `void pass_by_value(struct M);` and expect to be able to initialize `m.b` in `pass_by_value()`, that shows you still do not understand pass by value. The parameter passed is local to the function, and since you return no value, any changes to it will be lost on function return. (period). You can pass a pointer as a parameter (e.g. `&m`) to the function and then you would be free to update the values at that memory address and have the changes preserved for use in `main()`. You would want to rename your function `void pass_by_reference (struct M*)`... — David C. Rankin, Jul 14 '21 at 06:28
@DavidC.Rankin I am well aware of the semantics of pass by value and reference. The example is taken out of context. In context it is a valid use-case. — Jonas Wolf, Jul 14 '21 at 06:49
hmm... even if this is about "passing to a function" and the linked question is "returning from a function", the answer given in the linked question also applies here. I see this a dupe. — Support Ukraine, Jul 14 '21 at 06:49
@fpiette: `m.b` is initialized before using it in `pass_by_value`. The question is more: is passing by value a paritally initialized struct already usage. — Jonas Wolf, Jul 14 '21 at 06:50
The only way to initialize `m.b` in your `pass_by_value()` is to change the function type from `void` to `struct M` and return the fully initialized struct overwriting the values in the original with it. — David C. Rankin, Jul 14 '21 at 06:56
@JonasWolf Sorry but first use of `m.b` is when `m` is passed to `pass_by_value`. Inside `pass_by_value` the value of m.b is undetermined using m.b in `pass_by_value` before assigning it a value produce UB. Thinking `pass_by_value` will make m.b initialized after return is **wrong**. — fpiette, Jul 14 '21 at 06:56
@DavidC.Rankin `m` is only used in `pass_by_value`. It is never returned. The original program is way longer, but has the same semantics. `m` is partially initialized (here) in `main` and further used only in `pass_by_value`. That is a strange, but IMO valid use case. — Jonas Wolf, Jul 14 '21 at 07:00
@fpiette I think perhaps you are misunderstanding that part. I don't think OP expects `m.b` to have a value back in `main`. Rather, I think OP says that `pass_by_value` will initialize/assign it before using it **inside** `pass_by_value`. That's perfectly legal (but not the most common design) — Support Ukraine, Jul 14 '21 at 07:00
@436427: yes, thanks. The question is: is passing by value already UB if partially initialized in C90? — Jonas Wolf, Jul 14 '21 at 07:01
@JonasWolf Why is it that you don't think the answer in the linked question is an answer to your question? — Support Ukraine, Jul 14 '21 at 07:02
@4386427 The linked answer does talk about return values not parameters. I was not sure if that is equivalent. And, secondly, the DRs were discussed in 2000. I was specifically asking about C90. — Jonas Wolf, Jul 14 '21 at 07:32
Re “The linked answer does talk about return values not parameters”: The linked **question** asks about a return value, but the [answer by Rici](https://stackoverflow.com/a/35571273/298225) discusses values generally and applies to arguments, return values, and other uses of structures as values. — Eric Postpischil, Jul 14 '21 at 11:31
@JonasWolf if all you care about is using `m.b` in `pass_by_value()` after it is initialized -- that's fine, that's not even an issue. The only issue arises if you are thinking `m.b` will retain its value after function return -- it won't. — David C. Rankin, Jul 14 '21 at 17:54

Eric Postpischil · Accepted Answer · 2021-07-14T13:18:05.523

The C 1990 standard (and the C 199 standard) does not contain the sentence that first appears in C 2011 that makes the behavior of using some uninitialized values undefined.

C 2011 6.3.2.1 2 includes:

… If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

The whole of the corresponding paragraph in C 1990, clause 6.2.2.1, second paragraph, is:

Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.

Therefore, the behavior of the code in the question would seem to be defined, inasmuch that it passes the value stored in the structure.

In the absence of explicit statements in the standard, common practice helps guide interpretation. It is perfectly normal not to initialize all members of a structure yet to expect the structure to represent useful data, and therefore the behavior of using the structure as a value must be defined if at least one of its members is initialized. The equivalent question for C 2011 contains mention (from a C defect report) of the standard struct tm in one of its answers. The struct tm may be used to represent a specific date by filling in all of date fields (year, month, day of month) and possibly the time fields (hour, minute, second, even Daylight Savings Time indication) but leaving the day of week and day of year fields uninitialized.

In defining undefined behavior in 3.16, the 1990 standard does say it is “Behavior, upon use … of indeterminately valued objects, for which this International Standard imposes no requirements.” And 6.5.7 says “… If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate…” However, a structure with automatic storage duration in which one member, but not another, has been initialized is neither fully initialized nor not initialized. Given the intended uses of structures, I would say we should not consider use of the value of a partially initialized structure to be subject to being made undefined by 3.16.

_In the absence of explicit statements in the standard_ But C90 is explicit about that parameter is assigned the value of the argument. — Language Lawyer, Jul 14 '21 at 13:09
When the C89 Standard was written, the authors expected that if all existing compilers for commonplace platforms would process some action consistently, the only compiler writers that would care if the Standard defined the behavior were those for obscure platforms, and the only programmers that should need to care would be those whose code might be called to run upon obscure platforms. Note by definition, any source text that at least one conforming C compiler somewhere in the universe will accept and translate into executable code is, by definition, a Conforming C Program., so... — supercat, Jul 15 '21 at 15:16
...the authors of the C89 saw no need to be precise about what precise actions do or don't have defined behavior. Quality compilers will uphold the Spirit of C, even if Garbage C Compilers regard it with contempt. — supercat, Jul 15 '21 at 15:20

score 0 · Answer 2 · answered Jul 14 '21 at 22:55

Under C90, if an object held Indeterminate Value, each and every bit could independently be zero or one, regardless of whether or not they would in combination represent a valid bit pattern for the object's type. If an implementation specified the behavior of attempting to read each and every one of the 2ⁿ individual possible bit patterns an object could hold, the behavior of reading an object with Indeterminate Value would be equivalent to reading the value of an arbitrarily chosen bit pattern. If there were any bit patterns for which an implementation did not specify the effect of an attempted read, then the effects of trying to read an object that might hold such bit patterns would be likewise unspecified.

Code generation efficiency could be improved in some cases by specifying the behavior of uninitialized objects more loosely, in a way which would not otherwise be consistent with sequential program execution as specified but would nonetheless meet program requirements. For example, given something like:

struct foo { short dat[16]; } x,y,z;
void test1(int a, int b, int c, int d)
{
  struct foo temp;
  temp.dat[a] = 1;
  temp.dat[b] = 2;
  temp.dat[c] = 3;
  temp.dat[d] = 4;
  x=temp;
  y=temp;
}
void test2(int a, int b, int c, int d)
{
  test1(a,b,c,d);
  z=x;
}

If client code only cares about the values of x and y that correspond to values of temp that were written, efficiency might be improved, while still meeting requirements, if the code were rewritten as:

void test1(int a, int b, int c, int d)
{
  x.dat[a] = 1;
  y.dat[a] = 1;
  x.dat[b] = 2;
  y.dat[b] = 1;
  x.dat[c] = 3;
  y.dat[c] = 1;
  x.dat[d] = 4;
  y.dat[d] = 1;
}

The fact that the original function test1 doesn't do anything to initialize temp suggests that it won't care about what is yielded by any individual attempt to read it. On the other hand, nothing within the code for test2 would imply that client code wouldn't care about whether all members of x held the same values as corresponding values of y. Thus, such an inference would more likely be dangerous there.

The C Standard makes no attempt to define behavior in situations where an optimization might yield program behavior which, although useful, would be inconsistent with sequential processing of non-optimized code. Instead, the principle that optimizations must never affect any defined behavior is taken to imply that the Standard must characterize as Undefined all actions whose behavior would be visibly affected by optimization, leaving implementor discretion the question of what aspects of behavior should or should not be defined in what circumstances. Ironically, the only time the Standard's laxity with regard to this behavior would allow more efficient code generation outside contrived scenarios would be in cases where implementations treat the behavior as at least loosely defined, and programmers are able to exploit that. If a programmer had to explicitly initialize all elements of temp to avoid having the compiler behave in completely nonsensical fashion, that would eliminate any possibility of optimizing out the unnecessary writes to unused elements of x and y.

Undefined behavior when working with partially initialized struct in C90

2 Answers2