30

Code 1

#include <stdio.h>
int f(int *a, int b) 
{
  b = b - 1;
  if(b == 0) return 1;
  else {
    *a = *a+1;

    return *a + f(a, b);
  }
}

int main() {
  int X = 5;
  printf("%d\n",f(&X, X));
}

Consider this C code. The question here is to predict the output. Logically, I get 31 as ouput. (Output on machine)

When I change the return statement to

return f(a, b) + *a;

I logically get 37. (Output on machine)

One of my friends said that while computing the return statement in

return *a + f(a, b);

we compute the value of a while going depth of the tree, i.e. *a first computed then f(a, b) is called, whereas in

return f(a,b) + *a;

It is resolved while returning back, i.e. f(a, b) is computed first then *a is called.

With this approach, I tried predicting the output of the following code myself:

Code 2

#include <stdio.h>
int foo(int n) 
{
    static int r;
    if(n <= 1)
        return 1;

    r = n + r;
    return r + foo(n - 2);
} 

int main () {
   printf("value : %d",foo(5));
}

For return(r+foo(n-2));

enter image description here

Im getting 14 as Output Logically (Output on machine)

For return(foo(n-2)+r);

enter image description here

I get 17 as output. (Output on machine)

However, when I run the code on my system I get 17 in both cases.

My Questions:

  • Is the approach given by my friend correct?
  • If so, why do I get the same output in Code 2 when I run in a machine?
  • If not, what is the correct way to interpret Code 1 and Code 2?
  • Is there any undefined behaviour because C does not support pass by Reference? As it is being used in Code 1 tough it can be implemented using pointers?

In a nutshell, I simply wanted to know the correct way to predict the output in the 4 cases mentioned above.

idmean
  • 14,540
  • 9
  • 54
  • 83
  • 1
    For 10K users: closely related to, but different from [How the given C code works?](http://stackoverflow.com/questions/41775508/how-the-given-c-code-works), now deleted — and asked by a different user. – Jonathan Leffler Jan 21 '17 at 05:54
  • 5
    Since the order of evaluation of the terms in `return *a + f(a, b);` (and in `return f(a, b) + *a;`) is undefined and the function modifies the value that `a` is pointing at, your code has undefined behaviour and any answer is possible. – Jonathan Leffler Jan 21 '17 at 05:59
  • Shall I conclude " if the operation is (a+b) then it depends upon the language whether to compute 'a' first or 'b' first. " –  Jan 21 '17 at 06:20
  • The undefined behavior comes simply from the fact that you have two expressions whose values depend on the order in which they are evaluated and nothing in the standard requires them to be evaluated in some particular order. – David Schwartz Jan 21 '17 at 06:25
  • 2
    If the operation is `(a + b)`, it depends on the compiler (not the language) whether `a` or `b` is evaluated first; the language makes no requirements about the order of evaluation of those terms. – Jonathan Leffler Jan 21 '17 at 06:26
  • 1
    @DavidBowling: No, because you can't tell whether `*a` is evaluated before or after the function is called, so you can't tell what value will be added to the result of calling the function. – Jonathan Leffler Jan 21 '17 at 06:28
  • See also [Evaluation in return statement](http://stackoverflow.com/questions/41776084/evaluation-in-return-statement). – Jonathan Leffler Jan 21 '17 at 06:45
  • 2
    @JonathanLeffler: A compiler is required either evaluate `*a` and then call function `f()`, or call `f()` and then evaluate `*a`. It is not required to select among those choices in any consistent or predictable fashion, but it is not allowed to behave in completely arbitrary function as would be permissible if the code invoked Undefined Behavior. – supercat Jan 21 '17 at 07:40
  • @supercat : you're probably correct but I've gone to bed and will check my answer tomorrow. See also the discussion associated with the "evaluation in return statement" question (X-ref above); the same comments apply (it's bedtime). – Jonathan Leffler Jan 21 '17 at 07:45
  • 1
    @DavidSchwartz They are indeterminately sequenced. The behavior is defined. – 2501 Jan 21 '17 at 09:31

3 Answers3

18

Code 1

For Code 1, because the order of evaluation of the terms in return *a + f(a, b); (and in return f(a, b) + *a;) is not specified by the standard and the function modifies the value that a is pointing at, your code has unspecified behaviour and various answers are possible.

As you can tell from the furor in the comments, the terms 'undefined behaviour', 'unspecified behaviour' and so on have technical meanings in the C standard, and earlier versions of this answer misused 'undefined behaviour' where it should have used 'unspecified'.

The title of the question is "Is this undefined behaviour in C?", and the answer is "No; it is unspecified behaviour, not undefined behaviour".

Code 2 — as revised

For Code 2 as fixed, the function also has unspecified behaviour: the value of the static variable r is changed by the recursive call, so changes to the evaluation order could change the result.

Code 2 — pre-revision

For Code 2, as originally shown with int f(static int n) { … }, the code does not (or, at least, should not) compile. The only storage class permitted in the definition of an argument to a function is register, so the presence of static should be giving you compilation errors.

ISO/IEC 9899:2011 §6.7.6.3 Function declarators (including prototypes) ¶2 The only storage-class specifier that shall occur in a parameter declaration is register.

Compiling with GCC 6.3.0 on macOS Sierra 10.12.2, like this (note, no extra warnings requested):

$ gcc -O ub17.c -o ub17
ub17.c:3:27: error: storage class specified for parameter ‘n’
 int foo(static int n)
                    ^

No; it doesn't compile at all as shown — at least, not for me using a modern version of GCC.

However, assuming that is fixed, the function also has undefined unspecified behaviour: the value of the static variable r is changed by the recursive call, so changes to the evaluation order could change the result.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 2
    First example is defined. Please see my answer. http://stackoverflow.com/a/41777679/4082723 – 2501 Jan 21 '17 at 09:31
  • I have just Edited the question to take out the error in **Code 2** to remove the error due to storage class –  Jan 21 '17 at 09:46
  • 3
    Are you sure it is undefined? I would expect unspecified (with gcc evaluating right-to-left and MSVC evaluating left-to-right) but not undefined because I would think the function call introduces sequence points between evaluations of the left and right operands (no matter in which order). – Matthieu M. Jan 21 '17 at 13:07
  • 1
    @MatthieuM. It is unspecified. I think he will correct it soon. – haccks Jan 21 '17 at 13:31
  • @MatthieuM.; *with gcc evaluating right-to-left and MSVC evaluating left-to-right*: Where didi you get this? – haccks Jan 22 '17 at 07:49
  • @haccks: Experience. Dated experience. Hopefully valid. In 2008/2009 I was compiling code with both gcc 3.4.2 and Visual Studio 2003, I seem to remember realizing a few times that they differed in their order of evaluation. I never experienced interleaved evaluation issues with either compiler, but then I've been quite wary since then and tend to avoid relying on the order of evaluation so... – Matthieu M. Jan 22 '17 at 12:15
13

C standard states that

6.5.2.2/10 Function calls:

There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced1 with respect to the execution of the called function.94)

And foot note 86 (section 6.5/3) says:

In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.

In expressions return f(a,b) + *a; and return *a + f(a,b); evaluation of the subexpression *a is indeterminately sequenced. In this case different results can be seen for the same program.
Note that the side effect on a is sequenced in above expressions but it is unspecified in which order.


1. Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which. (C11- 5.1.2.3/3)

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • Shall I conclude " if the operation is (a+b) then it depends upon the language whether to compute 'a' first or 'b' first. ". –  Jan 21 '17 at 06:21
  • @pC_; There is no guarantee that which expression will be evaluated first. – haccks Jan 21 '17 at 06:23
  • Yeah that is what my conclusion is . IT is depends on the language more precisely **compiler** . Am I correct ? –  Jan 21 '17 at 06:24
  • 2
    @pC_ - Not really. It is very possible that neither `a` nor `b` will be computed first. It is possible that the computations are interleaved. – Robᵩ Jan 21 '17 at 06:25
  • @Robᵩ but the evaluation order is set when the compiler is build , right ? –  Jan 21 '17 at 06:26
  • @pC_ - I would not count on it unless the compiler documentation guarantees it, because the language doesn't. I can imagine a compiler in which the order of function-call evaluation depends upon, for example, what register allocation has occurred in other lines in this function. – Robᵩ Jan 21 '17 at 06:31
  • 4
    @pC_: the order could be changed by the optimization settings, the version of the compiler, the phase of the moon, and whether it likes your user ID or not. Yes, in practice, a given version of a compiler under a given set of optimization settings will usually evaluate things in the same order, but it might matter how complex an expression `a` or `b` was (that is, if the terms were `*c[i++] + *d[--j]`, for example, you'd have less idea about what goes on — the compiler might order things differently from a simple `a + b`). – Jonathan Leffler Jan 21 '17 at 06:32
  • @JonathanLeffler , Thanks for the points . You made my day :) –  Jan 21 '17 at 06:43
  • 3
    In C99 language there is a sequence point before and after a function call, so it is not the case that evaluation of `*a` and modification of the same object occurs without a sequence point in between. The C11 uses different language but it is logically equivalent. – M.M Jan 21 '17 at 07:38
  • @M.M; C11 says: *There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call.*, but how this is going to be applied here in case of `return *a + f(a,b);`? – haccks Jan 21 '17 at 07:52
  • 4
    @haccks after `f` and the parameters of `f` were evaluated, there is a sequence point . Then the statements in the body of `f` are executed, and those statements are indeterminately sequenced with the rest of the `return` expression. (in other words, the `*a` is either before all of the body statements, or after all of them) – M.M Jan 21 '17 at 07:54
  • @M.M; I changed my mind. You were right. I was thinking in terms of operator `+` in which case order of evaluation of its operand are unsequenced, but with function call its different. Thanks for picking it up. – haccks Jan 21 '17 at 09:07
8

I will focus on the definition of the first example.

The first example is defined with unspecified behavior. This means that there are multiple possible results, but the behavior is not undefined. (And the if the code can handle those results, the behavior is defined.)

A trivial example of unspecified, behavior is:

int a = 0;
int c = a + a;

It is unspecified whether left a or right a is evaluated first, as they are unsequenced. The + operator doesn't specify any sequence points1. There are two possible orderings, either left a is evaluated first and then right a, or vice-versa. Since neither side is modified2, the behavior is defined.


Had left a or right a been modified without a sequence point, i.e. unsequenced, the behavior would be undefined2:

int a = 0;
int c = ++a + a;


Had left a or right a been modified with a sequence point in between, then the left and the right side would be indeterminately sequenced3. This means that they are sequenced, but it is unspecified which one is evaluated first. The behavior would be defined. Mind that comma operator introduces a sequence point4:

int a = 0;
int c = a + ((void)0,++a,0);

There are two possible orderings.

If left side is evaluated first, then a evaluates to 0. Then the right side is evaluated. First (void)0 is evaluated followed by a sequence point. Then a is incremented, followed by a sequence point. Then 0 is evaluated as 0 and is added to the left side. The result is 0.

If the right side is evaluated first, (void)0 is evaluated followed by a sequence point. Then a is incremented, followed by a sequence point. Then 0 is evaluated as 0. Then the left side is evaluated, and a evaluates to 1. The result is 1.


You example falls into the latter category, as the operands are indeterminately sequenced. The function call serves the same purpose5 as the comma operators in the above example. Your example is complicated, so I will use mine, which also applies to yours. The only difference is that there are many more possible results in your example that in mine, but the reasoning is the same:

void Function( int* a)
{
    ++(*a);
    return 0;
}
int a = 0;
int c = a + Function( &a );
assert( c == 0 || c == 1 );

There are two possible orderings.

If the left side is evaluated first, a evaluates to 0. Then the right side is evaluated, there is a sequence point and the function is called. Then a is incremented, followed by another sequence point introduced by the end of the full expression6, the end of which is indicated by the semicolon. Then 0 is returned and added to 0. The result is 0.

If the right side is evaluated first, there is a sequence point and the function is called. Then a is incremented, followed by another sequence point introduced by the end of the full expression. Then 0 is returned. Then the left side is evaluated, and a evaluates to 1 and is added to 0. The result is 1.


(Quoted from: ISO/IEC 9899:201x)

1 (6.5 Expressions 3)
Except as specified later, side effects and value computations of subexpressions are unsequenced.

2 (6.5 Expressions 2)
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

3 (5.1.2.3 Program execution)
Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which.

4 (6.5.17 Comma operator 2)
The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its evaluation and that of the right operand.

5 (6.5.2.2 Function calls 10)
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call.

6 (6.8 Statements and blocks 4)
There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated.

2501
  • 25,460
  • 4
  • 47
  • 87
  • 1
    So, the expressions `a` and `((void)0,++a,0)` are unsequenced, but the presence of the sequence points in the second expression guarantees that, while the evaluation of `(void)0` may be interleaved with the evaluation of `a`, the evaluations of `++a` and `0` must be completed _after_ the evaluation of `(void)0`. If instead the second expression were `((void)0,++a)`, this would be undefined behavior, since if this expression is evaluated before `a` there is no sequence point between `++a` and `a`. Am I understanding this correctly? – ad absurdum Jan 21 '17 at 13:44
  • 1
    @DavidBowling Yes, that is exactly correct. One of the possible orderings would cause undefined behavior (6.5 §2), the other wouldn't, and the entire expression would be undefined due to: *If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.* – 2501 Jan 21 '17 at 13:46
  • 1
    Thank you. This was a very helpful explanation. – ad absurdum Jan 21 '17 at 13:50
  • 1
    @haccks You're misreading what David wrote. He is technically wrong, but he meant that they would be unsequenced without the comma. The comma operator introduces a sequence point, and yes, with it they are indeterminately sequenced. – 2501 Jan 21 '17 at 14:01
  • @2501; Oops! My bad. Deleting my comments. – haccks Jan 21 '17 at 14:04
  • 1
    I should have written, "...`a` and `((void)0, ++a, 0)` are _indeterminately_ sequenced,...". – ad absurdum Jan 21 '17 at 14:20