18

Take the following:

int a(void) {
    puts("a");
    return 0;
}

int b(void) {
    puts("b");
    return 1;
}

int c(void) {
    puts("c");
    return 2;
}

int d(void) {
    puts("d");
    return 3;
}

Will the following have predictable behavior?

int arr[4][4][4][4];
arr[a()][b()][c()][d()] = 1;

Is it guaranteed to print in this order:

a
b
c
d

I am aware that constructs such as the following are invalid:

int i;
i = i++;

This is because = is an unsequenced operator, so whether i or i++ is evaluated first is undefined. It is undefined behavior to access and modify a single object before another sequence point.

Put another way, is the following valid:

int i = 0, arr[4][4][4][4];
arr[i++][i++][i++][i++] = 1;

Or does it invoke undefined behavior due to unsequenced modification and access to i?

According to the C standard, is there a defined sequence point between each successive [] while indexing a multidimensional array?

To be clear, neither of these examples have to do with precedence, the placement of implicit parenthesis, the order in which the operators operate on their operands. The question is about sequencing, the order in which the operands themselves are evaluated.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
user16217248
  • 3,119
  • 19
  • 19
  • 37
  • 2
    `x[i]` is defined as `(*((x)+(i)))`, so `x[i][j]` becomes `(*((x[i])+(j)))` which becomes `(*(((*((x)+(i))))+(j)))`. There are no sequence points in that expression, so the order of evaluation is unspecified. (If you remove extraneous parentheses, it simplifies to `*(*(x+i)+j)`.) – Raymond Chen Feb 19 '23 at 20:11
  • 2
    As for what determines the order in which functions are called: The standard contains rules regarding order of evaluation that determine that. Similarily, the standard contains rules describing where sequence points are placed. They are related but distinct: One does not determine the other -- the standard determines them both. – Jason C Feb 19 '23 at 21:25
  • 3
    @user16217248 AFAIK, C does not place sequence points between the evaluations of the individual indices of an array. There is a list [here](https://learn.microsoft.com/en-us/cpp/c-language/c-sequence-points?view=msvc-170). – Jason C Feb 19 '23 at 21:27
  • 3
    Related: [What is the difference between operator precedence and order of evaluation?](https://software.codidact.com/posts/278172) – Lundin Feb 20 '23 at 11:45

3 Answers3

25

Will the following have predictable behavior?

int arr[4][4][4][4];
arr[a()][b()][c()][d()] = 1;

No.

While the evaluation of the array elements will be evaluated from left to right, as one is an operand to the next, there is no guarantee that the array indexes themselves will be evaluated from left to right.

To be more specific, arr[a()] is evaluated before arr[a()][b()], which is evaluated before arr[a()][b()][c()], which is evaluated before arr[a()][b()][c()][d()]. However, a(), b(), c(), and d() may be evaluated in any order.

Section 6.5p3 of the C standard regarding expressions states:

The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced

Sections 6.5.2.1 regarding array subscripting makes no mention of sequencing of the operands E1[E2], although it does state that the prior expression is exactly equivalent to (*((E1)+(E2))). Then, looking at section 6.5.3.2 regarding the indirection operator * and section 6.5.6 regarding the additive operator +, neither make any mention of the evaluation of their operands being sequenced in any way. So 6.5p3 applies, and the functions a, b, c, and d may be called in any order.

For the same reasons, this:

arr[i++][i++][i++][i++] = 1;

Triggers undefined behavior since the evaluation of the array indices are unsequenced with relation to each other, and you have multiple side effects on the same object without a sequence point.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • However, `arr[a()][b()][c()][d()]` might be OK if `a()`, `b()`, `c()`, and `d()` were independent of each other. – Spencer Feb 20 '23 at 14:41
  • @Spencer It's OK either way, as the function calls won't interleave with each other. So there's no undefined behavior in that case, just unspecified behavior. – dbush Feb 20 '23 at 14:48
  • Depends on your meaning of OK. Since OP's functions have common side effects they're not really "independent". I just think adding the detail in my first comment would avoid the implication of a categorical "don't do this". – Spencer Feb 20 '23 at 14:53
  • @Spencer They are dependent because they modify the same object, specifically the stream position indicator of `stdout`, in unspecified order. – user16217248 Mar 10 '23 at 18:11
14

The indices of a multidimensional array access are not guaranteed to evaluate in any particular order. A demonstration shows the functions being called from left to right, but selecting a different compiler in Godbolt evaluates them right to left.

Upon further investigation, the second code example causes a warning with Clang:

warning: multiple unsequenced modifications
      to 'i' [-Wunsequenced]
        arr[i++][i++][i++][i++] = 1;
             ^    ~~

The multidimensional array access can be broken down into a series of (array)[index] where array is a higher dimension of the multidimensional array (arr itself is the highest dimension) and index is the index expression, such as a function call or i++ expression.

The Standard holds that lhs[rhs] is equivalent to *((lhs)+(rhs)), hence it is indeterminate whether any given index, or the array it is indexing, is evaluated first, since the + operator is unsequenced. In all cases where array is not arr itself, evaluating array involves evaluating its index in an even higher dimension.

Therefore, the order in which the indices of a multidimensional array access are evaluated is indeterminate.

user16217248
  • 3,119
  • 19
  • 19
  • 37
  • 1
    This is an interesting point "...but selecting a different compiler in Godbolt evaluates them right to left." I was wondering about the definition of predictable behaviour. Is it predictable at the compilation level or runtime level? I.e. would the behaviour change post compilation, say at runtime? or be consistent? – Emile Feb 21 '23 at 12:20
  • Just read up on this, fascinating. https://pvs-studio.com/en/docs/warnings/v567/ – Emile Feb 21 '23 at 12:30
  • 1
    @Emile The C standard makes no distinction to my knowledge between unpredictable compile time behavior but consistent at runtime or unpredictable behavior period. While in practice, some constructs, while having unspecified behavior, will behave the same between executions once compiled, as far as the C Standard is concerned, unspecified behavior is unspecified behavior. The clause is along the lines of behavior to which two or more possibilities are provided and no further requirements are placed upon which is chosen at any time. – user16217248 Feb 21 '23 at 16:59
  • @Emile The behavior of constructs, as outlined by the Standard, seems to be either portable (unambiguous, well-defined behavior), implementation defined (implementations get to choose but must document behavior and adhere to it), unspecified (two or more possibilities, no further requirements), and undefined (nasal demons). – user16217248 Feb 21 '23 at 17:03
  • 1
    @user16217248: Note that the Standard recognizes three situations where it characterizes a program as invoking Undefined Behavior (a program executes a non-portable construct that might be correct, a program executes an erroneous construct, or a correct and portable program receives erroneous data), and while the Standard waives jurisidiction in all three situations, that is not intended to invite nasal demons in all three. – supercat Feb 21 '23 at 21:16
  • What would be an example of the third one? – user16217248 Feb 21 '23 at 23:53
  • 1
    @user16217248: Attempting to use binary mode to access a file that was created in text mode, or vice versa. There is no portable means via which a program can determine whether any particular file was created in binary mode or text mode, and on operating systems that have different file modes weird things may happen if a file is written with one mode and read with the other. Some systems, for example, may precede each line of a text file with a couple of bytes indicating the length, and might behave oddly if the number of bytes preceding the text of the last line, plus its length, ... – supercat Feb 24 '23 at 21:57
  • ...would yield a file offset which falls outside the file. Most systems will in fact fully specify how they would behave in such cases (e.g. in MS-DOS and Windows, text mode causes LF to be replaced with CR+LF when writing, causes a CR that is immediately followed by an LF to be ignored on reading, and causes a SUB chracter (0x1A) to be appended to a file when writing, treated as an end-of-file indication when reading) but classifying the action as "Implementation Defined" would make the language unimplementable on platforms where it could cause wierd and unpredictable behavior. – supercat Feb 24 '23 at 22:03
  • @supercat Wait. Simply opening a file that happened to be created in the other mode is undefined behavior? – user16217248 Feb 24 '23 at 23:13
  • 1
    @user16217248: Yup. Of course, on many platforms nothing weird would happen, and on most of the rest the attempt to open the file would simply fail, but the Standard imposes no requirements upon what happens because it was obvious that implementations should process such actions "in a documented manner characteristic of the environment" when targeting environments that had a documented characteristic behavior. The notion that the Standard only characterized as UB actions which were intended to invite nasal demons is a lie used to justify obtuseness by compiler writers. – supercat Feb 24 '23 at 23:18
  • @supercat So if I just `fopen()` some random file, chances are nasal demons, unless I guess the mode right? – user16217248 Feb 24 '23 at 23:19
  • 1
    @user16217248: The Standard makes no attempt to anticipate and forbid all of the ways an obtuse compiler writer might break programs that would be processed meaningfully by other compilers, because the authors recognized that nobody wanting to sell compilers would engage in such destructive behavior. Unfortunately, they failed to anticipate that a compiler that was exempt from market forces could achieve dominance. – supercat Feb 24 '23 at 23:26
3

Given the constructs

int x; // At file scope
... and then within some function
arr[f()][x] += 1;
arr[x][f()] += 2;

it would not be unusual for a compiler to process each line by performing the call to f() prior to the read of x, regardless of whether the function call was the first or second index. While the actual addition of the inner index to the sub-array pointer may not be possible until after the sub-array address is computed, but a compiler may compute any or all of the array subscripts before starting work on address calculations.

supercat
  • 77,689
  • 9
  • 166
  • 211