2

I created a C99 VLA function as such :

void create_polygon(int n, int faces[][n]);

I want to call this function in another function where I would allocate my two-dimensional array :

void parse_faces()
{
    int faces[3][6];

    create_polygon(6, faces);
}

When I pass a two-dimensional array as an argument, it passes a pointer to a 6 integer array, referencing the stack memory in the calling function.

The VLA argument here only acts as a type declaration (not allocating any actual memory), telling the compiler to access the data in row-major order with ((int*)faces)[i * 6 + j] instead of faces[i][j].

What is the difference between declaring functions with a VLA argument or with a fixed size ?

explogx
  • 1,159
  • 13
  • 28
  • 2
    Why not pass *both* sizes to the function? – Some programmer dude Dec 15 '18 at 16:39
  • Passing the first dimension size is irrelevant, the first level of indirection is passed as a pointer to an array in a case of a multidimensional-array. – explogx Dec 15 '18 at 16:41
  • 2
    Yes, but you still *need* the first size in the function. Otherwise you will not know how long to iterate over it. – Some programmer dude Dec 15 '18 at 16:53
  • Oh. I forgot to say the first dimension is 3 (triangle). But the question is not what you are answering me. My question is about the difference between the VLA used in prototype and the difference with fixed size. – explogx Dec 15 '18 at 16:58
  • Simpler example: `void f(int* p); void g(void) { int a[7]; f(a); }` - as a decays to pointer, how would f know that p (aka a) has 7 elements? Exception: if you have a sentinel value at the end (e. g. terminating 0 in strings or a null pointer in an array of pointers). – Aconcagua Dec 15 '18 at 16:59
  • If you truly want to know the difference, look at the assembly code the compiler generates your your VLA function, and for a function without a VLA. If there's any practical differences then that will show it. – Some programmer dude Dec 15 '18 at 17:04

2 Answers2

2

faces[i][j] always is equivalent to *(*(faces + i) + j), no matter if VLA or not.

Now let's compare two variants (not considering that you actually need the outer dimension as well to prevent exceeding array bounds on iterating):

void create_polygon1(int faces[][6]);
void create_polygon2(int n, int faces[][n]);

It doesn't matter if array passed to originally were created as classic array or as VLA, first function accepts arrays of length of exactly 6, second can accept arbitrary length array (assuming this being clear so far...).

faces[i][j] will now be translated to:

*((int*)faces + (i * 6 + j)) // (1)
*((int*)faces + (i * n + j)) // (2)

Difference yet looks marginal, but might get more obvious on assembler level (assuming all variables are yet stored on stack; assuming sizeof(int) == 4):

LD     R1, i;
LD     R2, j;
MUL    R1, R1, 24; // using a constant! 24: 6 * sizeof(int)!
MUL    R2, R2, 4;  // sizeof(int)
ADD    R1, R2, R2; // index stored in R1 register

LD     R1, i;
LD     R2, j;
LD     R3, m;      // need to load from stack
MUL    R3, R3, 4;  // need to multiply with sizeof(int) yet     
MUL    R1, R1, R3; // can now use m from register R3
MUL    R2, R2, 4;  // ...
ADD    R1, R2, R2; // ...

True assembler code might vary, of course, especially if you use a calling convention that allows passing some parameters in registers (then loading n into into R3 might be unnecessary).


For completeness (added due to comments, unrelated to original question):
There's yet the int* array[] case: Representation by array of pointers to arrays.

*((int*)faces + (i * ??? + j))

doesn't work any more, as faces in this case is no contiguous memory (well, the pointers themselves are in contiguous memory, of course, but not all the faces[i][j]). We are forced to do:

*(*(faces + i) + j)

as we need to dereference the true pointer in the array before we can apply the next index. Assembler code for (for comparison, need a more complete variant of the pointer to 2D-array case first):

LD     R1, faces;
LD     R2, i;
LD     R3, j;
LD     R4, m;      // or skip, if no VLA
MUL    R4, R4, 4;  // or skip, if no VLA
MUL    R2, R2, R3; // constant instead of R3, if no VLA
MUL    R3, R3, 4;
ADD    R2, R2, R3; // index stored in R1 register
ADD    R1, R1, R2; // offset from base pointer
LD     R1, [R1];   // loading value of faces[i][j] into register

LD     R1, faces;
LD     R2, i;
LD     R3, j;
MUL    R2, R2, 8;  // sizeof(void*) (any pointer)
MUL    R3, R3, 4;  // sizeof(int)
ADD    R1, R1, R2; // address of faces[i]
LD     R1, [R1];   // now need to load address - i. e. de-referencing faces[i]
ADD    R1, R1, R3; // offset within array
LD     R1, [R1];   // loading value of faces[i][j] into register
Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • What do you mean by `faces[i][j]` always is equivalent to `*(*(faces + i) + j)` ? In the case of an array type, isn't it equal to `*((int*)faces + (i * n + j))` where `n` the size of the inner dimension ? – explogx Dec 15 '18 at 18:06
  • But there is a big difference when passing multidimensional array types, no ? Here `*(*(faces + i) + j)` is equivalent to `*(*((int (*)[6])faces + i) + j)` and because array types decay into pointers when used as **lvalue**, `*(faces + i)` is a pointer to int. But if I would have passed `int **faces` it would have crashed because `*(*(faces + i) + j)` would have been equivalent to `*(*((int**)faces + i) + j)`. One actually dereferences 4*6 bytes and gets an array type (which decay into a pointer when used) and the other dereferences 8 bytes and gets a garbage pointer. Am I right ? – explogx Dec 16 '18 at 09:15
  • @Prion Caution: `int**` is something totally different!!! Each element in this type of array is a pointer itself, *not* an array. Each pointer *can* point to an array, of course, but is *not* itself. Sure, access in C still is faces[i][j], but you cannot calculate the offset directly as with `int(*)[whatever]`: you need to calculate offset `i` first, load the pointer address and *then* add `j` to it. Be aware that with real 2dim array you have contiguous memory and each row necessarily has same size. Both does not necessarily apply (the former in general rather unlikely) for `int**`. – Aconcagua Dec 16 '18 at 12:11
  • @Prion question does not consider `int**` at all, though, so I do not refer to in my answer either... – Aconcagua Dec 16 '18 at 12:15
  • But array types, when used as **lvalue** decay into pointers to their first element, so `faces[0]` for instance is of the type array of 6 integer but when used as **lvalue** it decays into a pointer to int. So I could do `int *p = faces[0]`. – explogx Dec 16 '18 at 14:35
  • @Prion This is true, of course, but is little related to the original question. Consider any arbitrary pointer `X* x`. Then `x[y]` is `*(X*)((char*)x + y * sizeof(X)`. Now comparing `int**` and `int(*)[whatever]`, the decisive difference is what `sizeof(X)` results in - size of pointer in former case and (possibly dynamic) size of (undecayed) array! Another aspect is the type of the result - in the latter case you still have an array and e. g. can get number of elements via `sizeof(x)/sizeof(*x)` whereas would fail in former case. – Aconcagua Dec 17 '18 at 08:56
  • Yes because `int (*)[whatever]` is the same thing as passing a two dimensional array as a function argument, and I prefer this form `int [][whatever]` which makes it clearer I want to pass a two dimensional array, even though it will decay into a pointer to array. – explogx Dec 17 '18 at 09:07
  • @Prion `int[][whatever]` is certainly easier to read... In general (with exception to C strings...), I use the array-like variant to denote that the parameter is intended to accept a pointer to array (and thus we either need an additional size/length parameter or a sentinel value in the array) and the pointer variant to denote the parameter is intended to accept one single value, such as: `void f(int[] values); void g(int* value); int a[7]; int n; f(a); g(&n);` In short: `*` in parameter is paired with `&` in argument... – Aconcagua Dec 17 '18 at 10:03
  • In the end, only a matter of taste, though (or the coding conventions you are mandated to follow...). – Aconcagua Dec 17 '18 at 10:04
  • Yes. But for instance, Torvalds explicitly forbid the use of empty braces for the first dimension and only accepts the pointer equivalent syntax. I think if empty braces exists they are here for a reason... – explogx Dec 17 '18 at 10:18
  • Well, that's only a convention again. That would then, though, require to have sizes in front of arrays: `f(size_t num, int array[num]).` On VLA, you need to have all but one sizes in front anyway; not having all of them then would be strange enough (`void f(r, c, array[r][c]);` vs. `void f(c, array[][c], r);`) - repeating r as dimension clearly shows how the function is intended to be used, so there's some reasoning behind... – Aconcagua Dec 17 '18 at 10:30
  • By the way: `int** p` vs. `int(*a)[whatever]` in array variant: `int* a[]` vs. int `a[][whatever]` - its quite unfortunate that the second case in the former variant ressembles quite a bit the first case in the latter variant! – Aconcagua Dec 17 '18 at 10:42
0

I disassembled this code :

void    create_polygon(int n, int faces[][6])
{
    int a = sizeof(faces[0]);
    (void)a;
}

With VLA argument :

movl    %edi, -4(%rbp)   # 6
movq    %rsi, -16(%rbp)  # faces
movl    %edi, %esi
shlq    $2, %rsi         # 6 << 2 = 24
movl    %esi, %edi

With fixed size :

movl    %edi, -4(%rbp)
movq    %rsi, -16(%rbp)
movl    $24, %edi        # 24

As Aconcagua pointed out, in the first example using a VLA, the size is computed at run time by multiplying the size of an int by the size of the second dimension, which is the argument stored in rsi, then moved into edi.

In the second example, the size is directly computed at compile time and placed into edi. The main advantage being the ability to check an incorrect pointer type argument if passing a different size, thus avoiding a crash.

explogx
  • 1,159
  • 13
  • 28