Returning a local partially initialized struct from a function and undefined behavior

Question

(By partially initialized I mean defined as uninitialized and one of its members is set to some valid value, but not all of them. And by local I mean defined with automatic storage duration. This question only talks about those.)

Using an automatic uninitialized variable that could be defined with register, as an rvalue is undefined behavior. Structs can be defined with register storage class specifier.

6.3.2.1

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

Note that it specifically says that and no assignments to it has been performed.

Additionally we know that a struct cannot be a trap value:

6.2.6.1.

The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation

Thus returning an uninitialized struct is clearly undefined behavior.

Statement: Returning an uninitialized struct that had one of its members assigned with a valid value, is defined.

Example for easier comprehension:

struct test
{
    int a;
    int b;
};

struct test Get( void )
{
    struct test g;
    g.a = 123;
    return g;
}

{
    struct test t = Get();
}

I just happened to focus on returning, but I believe this should apply to a simple assignment as well, without any difference.

Is my statement correct?

"I just happened to focus on returning, but I believe this should apply to a simple assignment as well, without any difference." Well, your code contains simple assignment in the caller. Copying two compatible structs is an allowed case for simple assignment. — Lundin, Feb 22 '16 at 11:00
For a dissenting opinion, I believe the _statement_ to be unconditionally correct. Consider that in pre-ANSI C the only (portable) way to copy/assign a struct was `memcpy`, which is obviously impervious to contents or trap representations. I believe that `memcpy` is still a legit way to copy/assign (fixed size) structs, though I can't point to the exact reference in the C standard that spells it out. Curiously enough, current C++ standards do in fact have the notion of a _trivial copy assignment operator_ which essentially amounts to a memcpy/memmove. — dxiv, Feb 23 '16 at 17:37
@dxiv: For some reason, the Standards Committee decided that memcpy shouldn't be a type-agnostic way of copying storage. From N1570 6.5 paragraph 6 "If a value is copied into an object having no declared type using memcpy or memmove , or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one." — supercat, Feb 26 '16 at 05:14
@supercat My reading of that is that memcpy'ing a partially initialized (or uninitialized) object results in a similarly partially initialized (or uninitialized) copy. `subsequent accesses that *do not modify* the value` - meaning reads or further memcpy - would of course still be subject to the same ground rules as for the original partially (or uninitialized) object. I am genuinely curious what your interpretation is which would contradict this, or my previous comment. That said, IANALL - I am not a language-lawyer. — dxiv, Feb 26 '16 at 05:29
@dxiv: Under C89, if one wanted to access the bits of a `float` as a same-sized `long`, one could safely memcpy from a `float*` to the `long*`, and then use dereference the `long*` even if the latter were allocated storage. Nothing in the Standard would even hint at that being forbidden. Under C99, using `memcpy` to copy a float to a location in allocated storage will set the effective type of the destination to `float`; I see no reason to believe it won't, and no reason to believe that the storage may then be read using an integer type. — supercat, Feb 26 '16 at 05:46
@supercat Structs don't have trap representations (unlike floats). Besides, I am _not_ talking about \*de\*referencing an uninitialized member of the struct, but merely about memcpy'ing the whole object to another object of the same type. Assuming the source object contained an uninitialized float, for example, that would still be an uninitialized float after being copied, so one could not _read_ it as a float after copying - just as one could not do it in the original object. But the copy operation itself would be valid, and one could then assign a valid value to the float in both cases. — dxiv, Feb 26 '16 at 05:53
@dxiv: Under C89, using `memcpy` on an uninitialized `float` would cause the destination to be filled with Unspecified (not Indeterminate) values; if the destination could be read with a pointer of some type before the operation, it may be just as legitimately read with a pointer of that type after. The quoted language from C99 explicitly changes that. — supercat, Feb 26 '16 at 06:33
@supercat I'll reiterate that the question, as well as my comment, referred to `struct` objects as a whole, not individual `C` basic types, and will just leave it at that. — dxiv, Feb 26 '16 at 06:47
@dxiv: I used "float" and "long" merely because those types exist without having to be defined. The same issues exist even more insidiously with structures (even if two structures have the same sequence of member types, using memcpy from one to the other in allocated storage may cause the space that had been occupied by the second to take on the Effective Type of the first), making it illegal to access it as the second type. — supercat, Feb 26 '16 at 17:00

score 12 · Accepted Answer · edited Feb 25 '16 at 08:40

12

Aside from the detail of returning the value from a function, this is precisely the subject of Defect Report 222, submitted in 2000 by Clive Feather, and the resolution of that DR seems to pretty clearly answer the question: returning a partially-uninitialized struct is well-defined (although the values of the uninitialized members may not be used.)

The resolution to the DR clarified that struct and union objects do not have trap representations (which was explicitly added to §6.2.6.1/6). Consequently member-by-member copying cannot be used on an architecture in which the individual members might trap. Although, presumably for parsimony, no explicit statement to this effect was added to the standard, footnote 42 (now footnote 51) which previously mentioned the possibility of member-by-member copying was replaced by a much weaker statement indicating that padding bits need not be copied.

The minutes of the WG14 meeting (Toronto, October 2000) are clear (emphasis added):

DR222 - Partially-initialized structures

This DR asks the question of whether or not struct assignment is well defined when the source of the assignment is a struct, some of whose members have not been given a value. There was consensus that this should be well defined because of common usage, including the standard-specified structure struct tm. There was also consensus that if assignment with some members uninitialized (and thus possibly having a trap value) was being made well defined, there was little value in requiring that at least one member had been properly given a value.
Therefore the notion that the value of a struct or union as a whole can have a trap value is being removed.

It's interesting to note that in the above minutes, the committee held that it was not even necessary that a single member of the struct had been given a value. However, that requirement was later reinstated in some cases, with the resolution to DR338 (see below).

In summary:

If an automatic aggregate object has been at least partially initialized or if its address has been taken (thereby rendering it not suitable for a register declaration as per §6.3.2.1/2), then lvalue-to-rvalue conversion of that object is well-defined.
Such an object can be assigned to another aggregate object of the same type, possibly after having been returned from a function, without invoking undefined behaviour.
Reading the uninitialized members in the copy is either undefined or indeterminate, depending on whether trap representations are possible. (A read through a pointer to an unsigned narrow character type cannot trap, for example.) But if you write the member before reading it, you're fine.

I don't believe there is any theoretical difference between assignment of union and struct objects. Obviously unions cannot be copied member by member (what would that even mean), and that the fact that some inactive member happens to have a trap representation is irrelevant, even if that member is not aliased by any other element. There's no obvious reason why a struct should be any different.

Finally, with respect to the exception in §6.3.2.1/2: this was added as a result of the resolution to DR 338. The gist of that DR is that some hardware (IA64) can trap the use of an uninitialized value in a register. C99 does not permit trap representations for unsigned chars. So on such hardware, it might not be possible to maintain an automatic variable in a register without "unnecessarily" initializing the register.

The resolution to DR 338 specifically marks as undefined behaviour the use of uninitialized values in automatic variables which could conceivably be stored in registers (i.e., those whose address has never been taken, as though declared register), thus permitting the compiler to keep an automatic unsigned char in a register without worrying about the previous contents of that register.

As a side effect of DR 338, it appears that completely uninitialized automatic structs whose address has never been taken cannot undergo lvalue-to-rvalue conversion. I don't know if that side-effect was fully contemplated in the resolution to DR 338, but it does not apply in the case of a partially initialized struct, as in this question.

edited Feb 25 '16 at 08:40

Ven

19,015
2
41
61

answered Feb 23 '16 at 07:26

rici

234,347
28
237
341

Is there any efficient standard means via which code that receives a structure whose member types lack trap representations can convert any Indeterminate values to Unspecified values? – supercat Feb 25 '16 at 17:02
@supercat: if you mean "make an indeterminate value into some unspecified but fixed value", I don't know of such a mechanism and I agree that it would occasionally be useful. In practice, I believe that malloc actually returns unspecified rather than indeterminate bytes, but the standard does not require that (and I personally think it should). – rici Feb 25 '16 at 21:50
I don't know of any existing implementations where the data returned by `malloc` could ever change after it is observed, compiler writers responding to a defect report argued that indeterminate bytes should remain indeterminate after having been read, so code which doesn't use some means to convert Indeterminate values to Unspecified may be broken by "creative" compiler writers. Another problematic situation occurs in cases where code uses allocated storage for one purpose and then another. If `struct foo *p` points to storage that has previously held something other than a `struct foo`... – supercat Feb 25 '16 at 22:17
...then saying e.g. `p->x = 23;` and `struct foo q = *p;` would invoke Undefined Behavior if `struct foo` has any fields besides `x`, even if code never examines any of those fields within struct `q`. It's possible to write strictly-conforming code which will convert Indeterminate values to Unspecified values while erasing the Effective Type, but such code will be horrendously slow despite the fact that it should be a no-op. – supercat Feb 25 '16 at 22:20
@supercat: why do you claim that `q = *p` is UB? – rici Feb 25 '16 at 23:21
Assume `void *m=malloc(8); float *fp=m; struct foo {int32_t w,x;} *p,q; *fp=12.0f;` The effective type of the first four bytes at `m` will be `float` until the next access that "modifies the stored value" (see N1570 6.5.6). After `p=m; p->x=23;`, even if nothing still cares about the `float` that was stored at `*fp`, the effective of the first four bytes at `m` will still be `float` (nothing has modified its value). The code `q=*p;` clearly accesses that storage using a pointer type which is in no way compatible with `float` (see N1570 6.5.7), thus invoking Undefined Behavior. – supercat Feb 26 '16 at 05:02
The reason I claim that's Undefined Behavior is that the Standard does not merely say that reading a stored object with an incompatible pointer yields Indeterminate Value, and writing an object with an incompatible pointer sets it to Indeterminate Value--it prohibits any such reads and writes entirely. As to the reason it's Undefined Behavior, it's because the rules are poorly written. [correct the refs to 6.5 para 6-7; http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 77] – supercat Feb 26 '16 at 05:09
@supercat: arguable. (It could also be argued that `p->x=23` does in fact change the effective type of the entire allocation, since there is no portable way to know what part of it is being written, so it might well modify the stored value.) But it is clear that the intention of DR 222 was to make `q = *p` well-defined, so if your interpretation is correct, that does indeed indicate a redaction error which should be reported (if you care about it enough.) – rici Feb 26 '16 at 05:21
One could have an implementation where `p->x` overlapped `*fp`, but only if `sizeof (float)` is larger than sizeof (int32_t)`. Otherwise, is there any way in which `p->x` could be deemed to modify the value stored in the space occupied by `p->w` without rendering expressions like `p->w++ + p->x++` Undefined? – supercat Feb 26 '16 at 06:05
Fundamentally, the only way C can remain competitive with other languages will be if parts of the Standard are substantially rewritten to match reality. Unfortunately, some of the worst parts have been around for so long they've become untouchable, since nobody wants to admit that the rules never really said what they thought they said. IMHO, the best way around that would be to add a directive which establishes aliasing rules which give programmers more freedom (at the cost of performance), and one which gives compilers more freedom (at the expense of semantic flexibility), and then... – supercat Feb 26 '16 at 06:12
@supercat: `p->x = ` cannot modify `p->w`. But that all takes place within a `struct foo`. Once you write to (a part of) a `struct foo`, you have changed the effective type of the entire struct. (At least, that's a potential interpretation.) – rici Feb 26 '16 at 06:12
...indicate that programs should when practical use directives to indicate what they actually need, and compilers should when practical add configuration options to control the default behavior. Then there'd be no need to argue about what the C89 or C99 rules meant, since code's meaning would be set by either directives or configuration options. – supercat Feb 26 '16 at 06:13
@supercat: I'm not on the C standards committee, nor have I ever been. You can vent all you like in comments to my answers, but my prediction is that it will have zero effect on the standards committee, because I don't believe they read comments here. If you want to participate, there are mailing lists. (And I don't necessarily disagree. I have a draft DR about malloc returning unspecified but not indeterminate values.) – rici Feb 26 '16 at 06:14
Sorry if I got into "vent" mode. You asked why I believed the Standard regarded reading a structure from re-used memory could invoke Undefined Behavior if the data wasn't written first as the structure type. Do you believe that anything would *require* that the value stored in `*fp` be deemed to have been altered by a write to a disjoint field of `struct foo` [note that if some action would allow a compiler to choose in Unspecified fashion among some behaviors, one of which invokes UB, the action giving the compiler the choice would itself invoke UB]. – supercat Feb 26 '16 at 16:45
If a compiler knows that two pointers can't alias at all, then it can arbitrarily reorder operations involving the pointers; such reordering is often very beneficial to performance on today's processors [and is probably why malloc() would return Indeterminate values, and why compiler writers would want to keep it that way]. Such optimizations would probably useful most of the time they could be applied, but only if there's a way to block them in cases where they would sacrifice correctness. – supercat Feb 26 '16 at 16:56
@supercat: I don't see how that relates to whether malloc returns indeterminate or unspecified values. Aliasing is not affected by that, afaics. What is affected is the necessity to actually load a value; afaik, all compilers will actually perform the load because they don't bother to track every byte in the malloc'd storage to know whether it has been assigned to or not. (Indeed, they don't track any of the bytes, not even as a total.) And that would be a silly optimization, because it would most only speed up bugs. There are applications for uninitialized storage, though. – rici Feb 26 '16 at 19:43
Now, your question: "Do you believe that anything would require that the value stored in *fp be deemed to have been altered...". Obviously, nothing requires the value to be deemed have been altered. But I claim that its effective type has changed, anyway. When you execute `p->x=...`, you are writing through a pointer to a `struct foo` (p). And your 6.5/6 says, (emphasis added) "If a value is **stored into** an object having no declared type...". So I claim that `p->x=...` stores a value *into* `*p`, and consequently the effective type of (the entirety of) `*p` has been set to `struct foo`... – rici Feb 26 '16 at 19:46
...Any other reading (yours, for example) would deny the existence of aggregate types. Your interpretation view an aggregate type as being reducible to a collection of disaggregated objects, and I do not believe that is C's object model. The intent of the C object model is to preclude basilisk types (your `*p` is partly `float[]`, partly `struct foo`, although you cannot portably know which part is which), but the C object model recognizes the integrity of aggregate types. IMHO. And under my interpretation, `q = *p` is well-defined. – rici Feb 26 '16 at 19:49
@supercat: Having said all that, I don't believe it is strictly relevant to this question. Why don't you ask a relevant question, which I will then answer, and we can delete this comment thread and throw the whole issue to the community where a C standards member might even possible see it. – rici Feb 26 '16 at 19:50
1

Really a great answer! – 2501 Feb 29 '16 at 00:35

Lundin · Answer 2 · 2016-02-22T11:01:48.383

Your statement about 6.3.2.1 is correct, if the object assigned to the lvalue is uninitialized, then the behavior is undefined.

So the question then is if your struct is to be regarded as uninitialized or not. You do assign a value to one of the members, so there has been an assignment to the object. As per the cited 6.3.2.1, that would mean that you cannot regard the struct as whole as uninitialized. That particular member is clearly initialized, even though the other members are not.

There is however another case of undefined behavior, and that is when storing a trap representation into the lvalue:

6.2.6.1/5
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

The text you cited in 6.2.6.1/6 says that the struct itself cannot be a trap representation, even though its individual members may be trap representations. If they are, then the assignment would be undefined behavior as per the above.

But note the "may be trap". It is not certain that they are trap representations, because they have indeterminate values. Take a look at the basics:

6.7.9/10
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

and

3.19.2/1
indeterminate value
either an unspecified value or a trap representation

Using a variable with indeterminate value is only undefined behavior in case the value is a trap representation.

Whether the uninitialized member variables of your struct will contain unspecified values or trap representations is implementation-defined behavior.

If the variable with indeterminate value simply has an unspecified value, then 6.2.6.1/5 does not apply and there is no undefined behavior.

Conclusion: if the implementation states that any indeterminate value for any of the struct members is a trap representation, the behavior is undefined. Otherwise, the behavior is merely implementation-defined/unspecified, the uninitialized members will hold unspecified values.

"Whether the uninitialized member variables of your struct will contain unspecified values or trap representations is implementation-defined behavior." - I disagree, can you cite where in the standard it says that this is implementation-defined? — M.M, Feb 28 '16 at 20:44
Even if individual members have a trap representation, none of the code in the question reads an individual member. The only assignment is values of type `struct test` and 6.3.2.1/6 says that there must not be a trap representation for those values. — M.M, Feb 28 '16 at 20:48
@M.M Regarding your first comment, that's what I did above. Read the three cited parts. Whether it is unspecified or implementation-defined isn't exactly clear, but that's not really important, as the question is if this is UB or not — Lundin, Feb 29 '16 at 07:14
@M.M: Even though `unsigned char` has no trap representations, the Standard expressly says that the behavior of reading an uninitialized object of that type is only defined if the address of the object is taken. I see nothing that would imply that the same principle would not apply to other objects whose types have no trap representations, but whose address is not taken. If one recognizes that the purpose of UB is to allow implementations to process code in the most useful fashion, rather than to invite implementations to behave gratuitously nonsensically, then it would make sense... — supercat, Jul 16 '21 at 18:59
...to allow implementations to trap on attempts to copy partially-initialized objects since such trapping may help programmers find bugs *in code which does not deliberately refrain from initializing portions of structures whose values won't be used*. The authors of the Standard made no attempt to judge whether it would be more useful to allow programmers to skip initialization of unused fields, or to trap failure to initialize them; characterizing the action as UB allows implementations to use whichever approach would better serve their customers. — supercat, Jul 16 '21 at 19:01

Returning a local partially initialized struct from a function and undefined behavior

2 Answers2

Linked