5

The C Standard has this language:

6.5.3.4 The sizeof and _Alignof operators

Semantics

  1. The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

It is unclear to me what the Standard means by If the type of the operand is a variable length array type, the operand is evaluated

  • If the type of the operand is a variable length array type, it does not seem to serve any purpose to evaluate the argument as the size can be determined from the definition of the type, as it is stipulated in 6.7.6.2 Array declarators that The size of each instance of a variable length array type does not change during its lifetime.
  • On the other hand, if the operand is a parenthesized name of a variable length array type, such as in sizeof(char[foo()]) the size expression must be evaluated at runtime to compute the size, but the language of the Standard does not seem to cover this case (what is the type of a type name?)

Should the language of the C Standard be amended for clarification?

Here is a test program to illustrate the behavior on some specific cases of VLAs:

#include <stdio.h>

static int N = 0;
int foo(void) { return ++N; }

int main() {
    typedef char S[foo()];      // foo() is called
    printf("typedef char S[foo()];\t");                             printf("N=%d\n", N);
    printf("sizeof(S)=%d\t\t", (int)sizeof(S));                     printf("N=%d\n", N);

    typedef char U[foo()];      // foo() is called
    printf("typedef char U[foo()];\t");                             printf("N=%d\n", N);
    printf("sizeof(U)=%d\t\t", (int)sizeof(U));                     printf("N=%d\n", N);

    S s1;
    printf("S s1;\t\t\t");                                          printf("N=%d\n", N);
    printf("sizeof(s1)=%d\t\t", (int)sizeof(s1));                   printf("N=%d\n", N);

    S s2;
    printf("S s2;\t\t\t");                                          printf("N=%d\n", N);
    printf("sizeof(s2)=%d\t\t", (int)sizeof(s2));                   printf("N=%d\n", N);

    U u1;
    printf("U u1;\t\t\t");                                          printf("N=%d\n", N);
    printf("sizeof(u1)=%d\t\t", (int)sizeof(u1));                   printf("N=%d\n", N);

    U *pu1 = &u1;
    printf("U *pu1 = &u1;\t\t");                                    printf("N=%d\n", N);
    printf("sizeof(*pu1)=%d\t\t", (int)sizeof(*pu1));               printf("N=%d\n", N);

    U *pu2 = NULL;
    printf("U *pu2 = NULL;\t\t");                                   printf("N=%d\n", N);
    // sizeof(*pu2) does not evaluate *pu2, contrary to the Standard specification
    printf("sizeof(*pu2)=%d\t\t", (int)sizeof(*pu2));               printf("N=%d\n", N);

    char x2[foo()][foo()];      // foo() is called twice
    printf("char x2[foo()][foo()];\t");                             printf("N=%d\n", N);
    printf("sizeof(x2)=%d\t\t", (int)sizeof(x2));                   printf("N=%d\n", N);
    printf("sizeof(x2[0])=%d\t\t", (int)sizeof(x2[0]));             printf("N=%d\n", N);

    // sizeof(char[foo()]) evaluates foo()
    printf("sizeof(char[foo()])=%d\t", (int)sizeof(char[foo()]));   printf("N=%d\n", N);
    return 0;
}

Output (both clang and gcc):

typedef char S[foo()];  N=1
sizeof(S)=1             N=1
typedef char U[foo()];  N=2
sizeof(U)=2             N=2
S s1;                   N=2
sizeof(s1)=1            N=2
S s2;                   N=2
sizeof(s2)=1            N=2
U u1;                   N=2
sizeof(u1)=2            N=2
U *pu1 = &u1;           N=2
sizeof(*pu1)=2          N=2
U *pu2 = NULL;          N=2
sizeof(*pu2)=2          N=2
char x2[foo()][foo()];  N=4
sizeof(x2)=12           N=4
sizeof(x2[0])=4         N=4
sizeof(char[foo()])=5   N=5
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • `sizeof(*pu2) does not evaluate *pu2, contrary to the Standard specification` Do you mean, that if `sizeof(*pu2)` would be evaluated, that you would expect `foo()` to be called? – KamilCuk Jul 21 '20 at 19:33
  • I like to use `int i = 0; char a[rand()%2 + 1]; printf("%zu\n", sizeof a[i++]); printf("%d\n", i);` increments `i`. `int i = 0; char a[42]; printf("%zu\n", sizeof a[i++]); printf("%d\n", i);` does not increment `i` as `i` in the argument of `sizeof` is not evaluated. @chqrlie Is this close to what you seek? – chux - Reinstate Monica Jul 21 '20 at 19:45
  • _`sizeof(*pu2)` does not evaluate `*pu2`, contrary to the Standard specification_ Maybe it is evaluated and you get undefined behavior which looks like it is not evaluated. – Language Lawyer Jul 21 '20 at 20:36
  • @LanguageLawyer: given that `pu2` is explicitly initialized to `NULL`, evaluating `*pu2` should have undefined behavior, which indeed could go unnoticed, for example if the code is elided, which is quite likely since the size if type `U` can be determined without even looking at `pu2`. – chqrlie Jul 21 '20 at 22:09
  • @KamilCuk: I don't expect `foo()` to be evaluated, but dereferencing a null pointer should have visible side effects although not required by the Standard. As a matter of fact, even if `pu2` is `volatile` qualified, the pointer is not dereferenced by clang. – chqrlie Jul 21 '20 at 22:13
  • @chux-ReinstateMonica: in your example, evaluating `i++` is unnecessary to determine the type of `sizeof a[i++]` and as a matter of fact the type of `a[i++]` is **not** a variable length array type, so no evaluation should take place. Also note **6.7.6.2 Array declarators** p5 *Where a size expression is part of the operand of a `sizeof` operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.* `a` should be defined as `char a[10][rand() % 2 + 1];` for `sizeof a[i++]` to increment `i`. – chqrlie Jul 21 '20 at 22:18
  • @chqrlie Agree'd that example was not good. – chux - Reinstate Monica Jul 21 '20 at 22:31
  • @chqrlie: But on what basis are you concluding that "evaluation" necessarily includes *lvalue conversion*? `*pu2` is clearly evaluated in the expression `*pu2 = 42;` but since it is "the left operand of an assignment operator", it does not undergo lvalue conversion (section 6.3.2.1/p2). That same clause also excludes "the operand of the *sizeof* operator" from lvalue conversion, so there is no reason to believe that `sizeof *pu2` would dereference NULL any more than `*pu2 = 42` would. – rici Jul 22 '20 at 02:56
  • With `char x2[foo()][foo()];`, it isn't clear whether you have `char x2[3][4]` or `char x2[4][3]`. The order of evaluation is probably compiler dependent. You have to print `sizeof(x2[0])` to find out which applies. – Jonathan Leffler Jul 22 '20 at 06:34
  • @JonathanLeffler: indeed probably at least compiler dependent: clang on OS/X prints `sizeof(x2[0])=4` but gcc on linux prints `sizeof(x2[0])=3`. I did not find anything in the Standard to lift this ambiguity. – chqrlie Jul 22 '20 at 07:54

2 Answers2

2

Every variably-modified type have a size which, for each dimension, is either a multiple of that dimension or is independent of it. There is no reason why evaluating the size of a variably-modified object should require evaluating the value of any dimension which will not affect the object's size, but some compilers may evaluate the values of such dimensions because the original rules for variably-modified types implied that they should be evaluated. In cases where different implementations process a construct differently, the authors of the Standard tend to avoid having the Standard suggest that either behavior is better. Thus, the Standard is deliberately ambiguous about corner cases involving variably-modified types, so as to avoid having to characterize any existing implementations' behavior as "wrong" or inferior.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • *Thus, the Standard is deliberately ambiguous about corner cases involving variably-modified types*... I think it should be even more ambiguous and state that *If the type of the operand is a variable length array type, the operand **may** be evaluated for the purpose of determining the size of the type*. This would make the even worse corner cases involving the comma operator undefined, such as `int n=1, a[n]; (void)sizeof *(printf("Gotcha!"), &a);` – chqrlie Jul 21 '20 at 22:48
  • 3
    @chqrlie: It would be helpful if the Standard were to define terminology for the different ways expressions can be used including "resolved" [what is done with an lvalue on the left side of an assignment operator or the right side of `&`, or as a result of implicit array decay], and "sized" [what is done via sizeof]. One could then specify the effects of evaluating, resolving, and sizing expressions in terms of which parts are evaluated, resolved, and sized. – supercat Jul 21 '20 at 22:54
  • Does it mean that the standard introduced ridiculous requirement, forced compiler developers to implement it, and now use keeping backward compatibility as an argument for preserving the requirement? ..looks like catch 22 – tstanisl Sep 08 '21 at 22:14
2

If the type of the operand is a variable length array type, it does not seem to serve any purpose to evaluate the argument as the size can be determined from the definition of the type, as it is stipulated in 6.7.6.2 Array declarators that The size of each instance of a variable length array type does not change during its lifetime.

But that size is not known until the array is instantiated at runtime. An evaluation of some sort has to be performed at runtime. What exactly that evaluation needs to be is not specified.

Should the language of the C Standard be amended for clarification?

I think so, yes. I consider the following idiom to be incredibly useful for dynamically allocating 2D arrays where the number of rows and columns isn't known until runtime:

int rows, cols;
...
T (*arr)[cols] = malloc( sizeof *arr * rows );

However, as the Standard is currently worded, this (most likely) invokes undefined behavior because I'm evaluating *arr at runtime, but arr is uninitialized (and most likely invalid) at that point. You shouldn't need to dereference arr to get the size of the array type, but unfortunately the language in the standard isn't that granular. I'd like to see language similar to "If the type of the operand is a variable length array type, the operand is evaluated for the purpose of obtaining the array size alone".

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • 2
    This is but one of many cases where a sufficiently pedantic reading of parts of the Standard would characterize as Undefined Behavior actions which have one meaning which could otherwise be inferred by reading other parts of the Standard and a platform's documentation. The authors of the Standard expected that compiler writers would give priority to defined behavior in cases where they would have no reason to do otherwise, and thus only thought it necessary to ensure that rules defined commonplace behaviors in cases where implementations might have reason to deviate from them. – supercat Jul 21 '20 at 21:10
  • @supercat it should be enough to say that only "size expressions may be evaluated". The "may" part is even already stated in https://port70.net/~nsz/c/c11/n1570.html#6.7.6.2p5 – tstanisl Sep 08 '21 at 22:19