How to index arrays using pointers safely

Question

Edit: If you fundamentally disagree with the Fedora guide here, please explain why this approach would be worse in an objective way than classic loops. As far as I know even the CERT standard doesn't make any statement on using index variables over pointers.

I'm currently reading the Fedora Defensive Coding Guide and it suggests the following:

Always keep track of the size of the array you are working with. Often, code is more obviously correct when you keep a pointer past the last element of the array, and calculate the number of remaining elements by substracting the current position from that pointer. The alternative, updating a separate variable every time when the position is advanced, is usually less obviously correct.

This means for a given array

int numbers[] = {1, 2, 3, 4, 5};

I should not use the classic

size_t length = 5;
for (size_t i = 0; i < length; ++i) {
    printf("%d ", numbers[i]);
}

but instead this:

int *end = numbers + 5;
for (int *start = numbers; start < end; ++start) {
    printf("%d ", *start);
}

or this:

int *start = numbers;
int *end = numbers + 5;
while (start < end) {
    printf("%d ", *start++);
}

Is my understanding the recommendation correct?
Is my implementation correct?
Which of the last 2 is safer?

All examples are flawed because they hardcode the element size of the array. Apart from that, it is opinion based which one you use. — 2501, Nov 10 '16 at 13:05
I think recommendation is to use values similar to c++ iterators `array.end()` like `numbers_end` rather than `numbers+5`... — Alexei Levenkov, Nov 10 '16 at 13:07
As Alexei points out, we don't see any **past** pointer in your examples. Those examples do not add any security. — Djee, Nov 10 '16 at 13:11
@AdHominem This statement "Often, code is more obviously correct when you keep a pointer past the last element of the array, and calculate the number of remaining elements by substracting the current position from that pointer." is just wrong. Do not trust everything that is written in books.:) — Vlad from Moscow, Nov 10 '16 at 13:13
@VladfromMoscow This is just from the official docs, I also looked at CERT but they didn't specify this... why is the approach using pointers worse? — AdHominem, Nov 10 '16 at 13:17
@Quentin The quoted text argues: *it's obvious*. This is not an argument. — 2501, Nov 10 '16 at 13:17
@VladfromMoscow By leaving such questions open and upvoted, we're implicitly giving such quotes relevance. — 2501, Nov 10 '16 at 13:19
@AdHominem I have not said that it is worse. I doubt that it is "more obviously correct".:) I do not see anything that does that "more obviously correct".:) For example as the quote says in any case you need to get the number of elements in the array using substraction of pointers. Is it obviously correct":) It is error prone.:) — Vlad from Moscow, Nov 10 '16 at 13:20
@VladfromMoscow I don't understand what's wrong about it. `end - current` does seem more robust and easier to check for correctness than a separate `remaining` variable that you have to update manually. — Quentin, Nov 10 '16 at 13:21
whether or not it's more correct to do one thing rather than the other is always debatable. If a project uses a single approach throughout, people who work on the code are more likely to be used to spotting mistakes with out-of-bounds access. I think the quote you provided is basically saying that fedora uses this approach, and recommends you do the same, because it's working for them — Elias Van Ootegem, Nov 10 '16 at 13:23
The quoted text basically says: my version is correct, period. Leaving nothing to debate about. — 2501, Nov 10 '16 at 13:24
@Quentin When you need to use one more operation to get a value then there is more chances to make an error.:) — Vlad from Moscow, Nov 10 '16 at 13:24
@Quentin There is nothing obvious.:) For example using pointers you can dereference the last point that points to beyond the array and get undefined behavior.:) — Vlad from Moscow, Nov 10 '16 at 13:25
@AdHominem In your provided link there is a problem with the `return` in the first code: `return outp - out;` one is `int` another one is `long int`. — Michi, Nov 10 '16 at 13:27
@2501 I think you have misread that "obviously" -- The quote says that using this method makes it obvious whether the code is correct or not, not that it is obviously the "best" method. — Quentin, Nov 10 '16 at 13:30
I don't understand why you would choose to use any of those methods. — RoadRunner, Nov 10 '16 at 13:31
@VladfromMoscow that's less operations: you'd need to manually update that `remaining` variable everywhere. With `end - cur` where `end` doesn't change and `cur` is the very variable I use to iterate, there simply can't be an oversight. — Quentin, Nov 10 '16 at 13:31
@Quentin You misread my comments. I never said that the quoted text says my version is obviously correct. But it does say my version is correct, and used the word obvious to argue for it. — 2501, Nov 10 '16 at 13:33
Is anyone reading this debate still convinced that this questions is not **primarily-opinion based**? Let me know. Otherwise please hit that close button. — 2501, Nov 10 '16 at 13:34
@2501 Strange, by not including `#include ` [I get this](http://pastebin.com/raw/q0e9Ujjb)! — Michi, Nov 10 '16 at 13:38
@Michi That's because you're misreading the output of the compiler. — 2501, Nov 10 '16 at 13:41
@Quentin I do not understand what "remaining variable" you are saying. The quote you showed does not make any sense. Maybe there is some context and relative to the context the quote has some sense. — Vlad from Moscow, Nov 10 '16 at 13:41
@2501 I had a lot and I just saw that conversion thing there. I know that I supposed to fix warnings/errors it from the beginning and not from the middle :)). — Michi, Nov 10 '16 at 13:42
@2501 Maybe I did then. But that "correct code looks obviously correct" seems like a valid argument to me. In any case, I personally think that "by which rationale is this guideline backed up" can very well be answered objectively, whether it's right or wrong. — Quentin, Nov 10 '16 at 13:43
`Always keep track of the size of the array you are working with.` What's wrong with the classical way : `size_t size = sizeof numbers / sizeof numbers[0];` ? — Michi, Nov 10 '16 at 13:46
@Quentin Consider this declaration. int *end = numbers + 5; First of all it uses magic number 5. Secondly it is enough to use the number of elements in the array instead of building a new pointer. In C++ there are used iterators but the idea is not that using iterators is "obviously more correct".:) — Vlad from Moscow, Nov 10 '16 at 13:47
@VladfromMoscow yep, not arguing for these magic numbers. That should depend on `sizeof numbers`. But that's not part of the quoted guideline, so I assume the pointers *are* correctly initialized :) — Quentin, Nov 10 '16 at 13:55
@Quentin So apart from knowing the actual size of the array you need to initialize the pointers correctly do you ?:) Is it "obviously more correct"?:) — Vlad from Moscow, Nov 10 '16 at 13:58
I would use C++ idioms for create obj with destructor / constructor, C idioms with malloc and micro instruction if not assembly for the remain low level — RosLuP, Nov 10 '16 at 14:19

score 2 · Accepted Answer · answered Nov 10 '16 at 17:55

Your understanding of what the text recommends is correct, as is your implementation. But regarding the basis of the recommendation, I think you are confusing safe with correct.

It's not that using a pointer is safer than using an index. The argument is that, in reasoning about the code, it is easier to decide that the logic is correct when using pointers. Safety is about failure modes: what happens if the code is incorrect (references a location outside the array). Correctness is more fundamental: that the algorithm provably does what it sets out to do. We might say that correct code doesn't need safety.

The recommendation might have been influenced by Andrew Koenig's series in Dr. Dobbs a couple of years ago. How C Makes It Hard To Check Array Bounds. Koenig says,

In addition to being faster in many cases, pointers have another big advantage over arrays: A pointer to an array element is a single value that is enough to identify that element uniquely. [...] Without pointers, we need three parameters to identify the range: the array and two indices. By using pointers, we can get by with only two parameters.

In C, referencing a location outside the array, whether via pointer or index, is equally unsafe. The compiler will not catch you out (absent use of extensions to the standard). Koenig is arguing that with fewer balls in the air, you have a better shot at getting the logic right.

The more complicated the construction, the more obvious it is that he's right. If you want a better illustration of the difference, write strcat(3) both ways. Using indexes, you have two names and two indexes inside the loop. It's possible to use the index for one with the name for the other. Using pointers, that's impossible. All you have are two pointers.

Lundin · Answer 2 · 2016-11-10T15:44:55.987

Is my understanding the recommendation correct?
Is my implementation correct?

Yes, so it seems.

The method for(type_t start = &array; start != end; start++) is sometimes used when you have arrays of more complex items. It is mostly a matter of style.

This style is sometimes used when you already have the start and end pointers available for some reason. Or in cases where you aren't really interested in the size, but just want to repeatedly compare against the end of the array. For example, suppose you have a ring buffer ADT with a start pointer and an end pointer and want to iterate through all items.

This way of doing loops is actually the very reason why C explicitly allows pointers to point 1 item out-of-bounds of an array, you can set an end pointer to one item past the array without invoking undefined behavior (as long as that item isn't de-referenced).

(It is the very same method as used by STL iterators in C++, although there's more of a rationale in C++, since it has operator overload. For example iterator++ in C++ doesn't necessarily give an item adjacently allocated in the next memory cell. For example, iterators could be used for iterating through a linked list ADT, where the ++ would translate to node->next behind the lines.)

However, to claim that this form is always the preferred one is just subjective nonsense. Particularly when you have an array of integers and know the size. Your first example is the most readable form of a loop in C and therefore always preferred whenever possible.

On some compilers/systems, the first form could also give faster code than the second form. Pointer arithmetic might give slower code on some systems. (And I suppose that the first form might give faster data cache access on some systems, though I'd have to verify that assumption with some compiler guru.)

Which of the last 2 is safer?

Neither form is safer than the other. To claim otherwise would be subjective opinions. The statement "...is usually less obviously correct" is nonsense.

Which style to pick vary on case-to-case basis.

Overall, those "Fedora" guidelines you link seem to contain lots of questionable code, questionable rules and blatant opinions. Seems more like someone wanted to show off various C tricks than a serious attempt to write a coding standard. Overall, it smells like the "Linux kernel guidelines", which I would not recommended to read either.

If you want a serious coding standard for/by professionals, use CERT-C or MISRA-C.

Thanks for the info. I was referring to the Fedora guide exactly because the CERT didn't mention array indexing, so i guess safety wise both ways are fine — AdHominem, Nov 10 '16 at 18:50
Is there any free download of the MISRA C ? It seems to be paperback only and like 8 years old. Is it still relevant? — AdHominem, Nov 10 '16 at 18:58
@AdHominem The C language is over 40 years old :) Anyway, the latest version of MISRA-C is from 2012 with 2 addendums that were added later. It is an actively updated standard and very much relevant, although it does not support C11 yet (neither do most compilers). You can purchase it as pdf format. — Lundin, Nov 11 '16 at 07:18

How to index arrays using pointers safely

2 Answers2