Is indirection of null pointer to array type UB?

Question

Consider this code:

#include <stdio.h>

int *f(int (*p)[2])
{
    return *p; //Possible UB here?
}

int main()
{
    printf("%p", f(NULL));
}

Is the fact that we are applying indirection to null pointer create UB?

Maybe it wouldn't because lvalue of array type is converted back to pointer and no object value is actually accessed. Which one is true in the case?

EDIT: I know exactly what UB is. I just want like proof or some kind of explanation using the standard paper why or why not is the above code UB.

Yes, any dereference of a NULL pointer gives undefined behaviour. That said, the definition of undefined behaviour is that there is no constraint on what happens as a result. So a particular compiler may do what you describe, or it may not. — Peter, Nov 17 '16 at 10:59
I know what UB is - I was just lawyering about how the standard defines this situation. — AnArrayOfFunctions, Nov 17 '16 at 11:04
From the fact you asked the question, I don't believe you do understand what UB is. — Peter, Nov 17 '16 at 11:05
Sourav Ghosh makes a good point. There is no such thing as an lvalue of array type in C. — Peter, Nov 17 '16 at 11:08
@Peter Of course it is. *p is an array and not a pointer. You can check that by trying to assign it and reading the produced error, or use sizeof. It is also an lvalue. — 2501, Nov 17 '16 at 11:09
@2501 Sorry sir, I still don;t get you. For some scenario, the decay happens, but not always...and even then, how there is no access _here_? — Sourav Ghosh, Nov 17 '16 at 11:13
@SouravGhosh I was responding to the quote you made. I'm surprised you don't know what an array or lvalue is. — 2501, Nov 17 '16 at 11:14
@2501 - an array is not an lvalue. This code compiles because it converts `*p` to a pointer (even ignoring the fact that evaluating `*p` gives undefined behaviour). — Peter, Nov 17 '16 at 11:16
@Peter What happens at runtime is irrelevant for determining types and lvalues. This things must be known at compile time. So your last comment doesn't matter. The expression `*p` is of type `int[2]`. It may decay under certain circumstances, but those weren't mentioned in your comment to which I responded. — 2501, Nov 17 '16 at 11:18
@Peter ok, but I believe , apart from `void`, any object type designator is lvalue, right? — Sourav Ghosh, Nov 17 '16 at 11:19
@2501 I guess there's a misunderstanding there, I meant to put the `tick` after the ... s. I was trying to ask for the reason behind the whole statement, not the "lvalue" part. — Sourav Ghosh, Nov 17 '16 at 11:21
@2501 - what happens at run time is pertinent to whether the behaviour is undefined. And it is not possible to convert `*p` to any other type without evaluating `*p`. The fact that the type of `*p` can be determined without evaluating `*p`, and that it is possible to determine if that type can be converted to an `int *`, does not change the presence of undefined behaviour if the expression is evaluated. — Peter, Nov 17 '16 at 11:23
I haven't deleted a comment - or, if I did, it was unintentional. I did edit a comment, but only added more to it (rather than removing text from it). But an array is still not an lvalue. — Peter, Nov 17 '16 at 11:32
@Peter Array is an lvalue. It just cannot be modified. C has a term for that called *modifiable lvalue*. A modifiable lvalue is an lvalue that can be modified. — 2501, Nov 17 '16 at 11:41

score 3 · Accepted Answer · answered Nov 17 '16 at 11:04

3

As I said in comment, yes. Any dereference of a NULL pointer gives undefined behaviour.

What you have to realise is that undefined behaviour means the standard articulates no requirements or constraints whatsoever on what happens as a result.

This means an implementation is free to behave as you describe - or not - when the behaviour of code is undefined. It is not required to behave - or not - in such a manner.

The behaviour of a compiler is not relevant in deciding what is undefined and what is not.

answered Nov 17 '16 at 11:04

Peter

35,646
4
32
74

Well thanks for the answer but I certainly knew what UB was. I was just asking for interpretation of this behavior using the standard paper. Why doesn't the fact that "an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’" makes this well defined instead (for example)? Why there is no such thing as an "lvalue of array type"? – AnArrayOfFunctions Nov 17 '16 at 11:09
2

It is the dereferencing that causes undefined behaviour. Converting the result to another type does not change the fact that undefined behaviour has occurred. – Peter Nov 17 '16 at 11:12
@Peter: Please have a look to my answer, and note that a comment in the standard explicitely allows dereferencing a null pointer provided it is only to take its address. – Serge Ballesta Nov 17 '16 at 23:09

Sourav Ghosh · Answer 2 · 2016-11-17T11:16:11.693

NULL is a null-pointer constant, and attempt to dereference a(ny) null-pointer (invalid memory) will lead to UB.

So, theoretically, we cannot dereference any pointer containing NULL.

Here, p being a pointer, and p == NULL, *p is an attempt to dereference. So, it invokes undefined behavior.

FWIW, one of the major use-case of NULL is to provide a valid value to check and stop dereference of a pointer holding NULL.

Serge Ballesta · Answer 3 · 2016-11-23T09:52:00.370

0

Original answer is left below, because I think it has interesting references to standard.

First a short answer: many others think that it is clearly UB, and even if I think that the intent is clear, I could not find a reference in standard showing that the expression is allowed. So the behaviour is undefined per standard.

But as explained below, dereferencing a pointer to an array is equivalent to casting the pointer to the first element of an array. And the cast is perfectly defined by the standard because what lies at the address of an array is the first element of the array if the pointer points to a true array. And if the pointer is null, it is explicitely allowed to cast a null pointer to a type to a null pointer to another type. So just replace the line

return *p;

because the standard does not explicitely specify what should happen with:

return (int *) p; // no UB here even if p is null!

This can be used for a pointer to an array of any type, including multidimensional array: the dereference can be safely replaced by a cast to the immediately underlying sub-array.

It is an interesting corner case. IMHO the standard is unclear on whether it is of not Undefined Behaviour. Here are some hints that could say that it is, from draft n1256 for C99 or n1570 for C11, 6.5.3.2 Address and indirection operators (all emphasizes are mine):

§4 The unary * operator denotes indirection... If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

And a note about that part insists that:

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer...

But it is not that clear, because an array is a derived type that is a non modifiable lvalue and can only be used in two contextes:

it can be converted (decay) to a pointer of its underlying type
it can be used with the [] postfix operator to build a lvalue to one of its elements

Using *p[i] would certainly be UB, because we start by doing arithmetics on a null pointer and then dereference the resul. No doubt here

But in the shown code (return *p;), we are in the first context, meaning that we only convert the array to a pointer. And the same note (on same paragraph) says:

Thus, &*E is equivalent to E (even if E is a null pointer)...

As p is a pointer to array, it shall be applied the semantics of multidimensional arrays. And the paragraph 6.5.2.1 Array subscripting of same standard is explicit on what happens for multidimensional arrays:

§ 3 Successive subscript operators designate an element of a multidimensional array object. If E is an n-dimensional array (n ³ 2) with dimensions i ´ j ´ . . . ´ k, then E (used as other than an lvalue) is converted to a pointer to an (n - 1)-dimensional array with dimensions j ´ . . . ´ k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the pointed-to (n - 1)-dimensional array

IMHO this clearly states that *p is (int *) p so the function f is required to return a null pointer when it receives a null pointer.

But the first comment cited here let think that any * operator applied to a null pointer leads to UB. The second part of same comment proves that it is false, but comments are not normative. So to avoid to be burned by a future version of an optimizing compiler actively chasing possible UB, I would treat that as UB and never use it in real code, even if I really think that it is allowed.

NOTE: I know that comments are not normative, but they are here to help to understand the standard. So when one comment explicitely says that &*E is equivalent to E (even if E is a null pointer) it really means that provided the result is still used for its address, applying the operator * to a null pointer is not necessarily UB.

edited Nov 23 '16 at 09:52

answered Nov 17 '16 at 13:32

Serge Ballesta

143,923
11
122
252

Thanks for digging -- this is an interesting case, in particular because 6.5.3.2/4 could not be any clearer, and the footnote could not be any clearer, and yet both are diametrically contradicting each other. One could probably read the footnote as "the * operator is not really applied if it is immediately preceded by a & operator" (and vice versa). In practice the issue can and should be fixed easily with a null pointer check in `f()` though. – Peter - Reinstate Monica Nov 17 '16 at 14:05
1

`p` is a **pointer to** an array, not an array, and when you pass `NULL` to `f`, evaluating `*p` is a null-pointer dereference. – Virgile Nov 17 '16 at 17:56
@Virgile: Please see my edit, and note that a comment in the standard explicitely allows dereferencing a null pointer provided it is only to take its address. – Serge Ballesta Nov 17 '16 at 23:12
IMHO, the Standard should include a rule that says that in case the Standard is even remotely ambiguous as to whether something is defined, but there is no ambiguity as to what it would mean if defined, any quality compiler should regard the behavior as defined unless or until the Standard is changed, or unless a programmer uses an option flag, directive, or similar means to indicate that the program does not rely upon the behavior in question. Such a principle *should* be common sense, but common sense hardly seems common these days. – supercat Nov 17 '16 at 23:40
@SergeBallesta but an when an array `a` is decaying to a pointer, you're not speaking about `&a` but about `&a[0]` (6.3.2.1§3: "an expression that has type array of type is converted to an expression with type pointer to type that points **to the initial element** of the array object" - emphasis mine), which is equivalent to `&(*(a+0))` (6.5.2.1§2 of C11). Substituting `*p` for `a`, we end up with `&(*((*p)+0))`. I agree that the `&` and the first `*` cancel each other, but you're still left with `*p` where p is NULL. – Virgile Nov 18 '16 at 08:07
The clause about `&*E` being valid for `E` a `NULL` pointer is because the `&` (address of) and unary `*` (pointer dereference) are inverses of each other, so `&*E` is `E`. This rule allows the compiler to simplify an expression of the form `&*E` in all cases, including if `E` is NULL, without undefined behaviour (and without a need to evaluate `*E`). This doesn't allow `*E` to be evaluated. – Peter Nov 18 '16 at 08:57
@supercat: thank for you comment, I feel a little less alone :-) – Serge Ballesta Nov 18 '16 at 09:51
@Peter: When p in a pointer to an array, `*p` is only a pointer conversion which is valid even for null pointers. But I understand you remark and Virgile's one as *a compiler could see that as UB* – Serge Ballesta Nov 18 '16 at 09:53
A compiler doesn't see anything as UB - the standard specifies what is undefined behaviour, not any compiler. `*p` gives undefined behaviour if `p` is a NULL pointer, regardless of what the type of that pointer is. No ifs, no buts. An expression of the form `&*E` in a single statement can be transformed by the compiler into `E` (so `*E` need not be evaluated as an intermediate step). The problem comes if the compiler is required to first evaluate `*E` (which causes UB) then compute the address of the result. – Peter Nov 18 '16 at 10:04
`p` being a pointer to an array does not change that. Evaluating `*p` gives undefined behaviour. – Peter Nov 18 '16 at 10:06
1

@Peter: even if something is undefined per standard, an implementation can choose to support it. GCC is known to support many extensions. And when the standard is not very explicit on one point, each implementation has to decide what it does. C has no reference implementation so the standard is sometimes *vague*... – Serge Ballesta Nov 18 '16 at 11:37
@SergeBallesta: Questions should perhaps be split into subparts: is there any reason a sane "normal" compiler might treat something as UB, would there be any justification for an obtuse compiler to treat something as UB, and is there enough of a danger that a compiler might treat something as UB that programmers should avoid it and sanitizing compilers should detect it. This answer would seem to answer the first question. – supercat Nov 18 '16 at 14:54
@SergeBallesta: It might be worth noting that the authors of the C89 Standard saw no reason to avoid situations where, for some behavior, the answers to the first two questions would be "no" and "yes", since they didn't expect that compilers would go out of their way to exploit UB. Later versions of the Standard have then built upon the notion that certain things "were" UB even if programmers and compiler writers alike had thought that they were only "technically" UB because of sloppiness in the Standard. – supercat Nov 18 '16 at 15:01

Is indirection of null pointer to array type UB?

3 Answers3