What code is more CPU expensive: while(*p) or while(i--)?

Question

What C code is more CPU expensive:

while(*pointer){ 
    pointer++; 
}

or

while(counter > 0){ 
    pointer++; 
    counter--; 
}

?

Only one way to find out. Test it with your compiler on your cpu. — klutt, Jan 12 '19 at 00:21
Looks like both code have no side effects, thus can be optimized into a no operation. Anyway, generally speaking, follow [rules of optimization](http://wiki.c2.com/?RulesOfOptimization). — KamilCuk, Jan 12 '19 at 00:25
Do you mind adding some context (a [mcve] would be nice) to those unrelated snippets? Is there any relation between whatever is pointed by `pointer` and `count`? — Bob__, Jan 12 '19 at 00:29
i don't know how to do it. exec time difference is so small that other factors are stronger. — legale, Jan 12 '19 at 00:31
char const unsigned *p = string.val; /* string cursor pointer */ for (size_t n = string.len; n > 0; --n) { /* cycle each byte with callback function */ (*filter->filter_function)(*p++, filter); } — legale, Jan 12 '19 at 00:36
char const unsigned *p = string.val; /* string cursor pointer */ while (*p) { /* cycle each byte with callback function */ (*filter->filter_function)(*p++, filter); } — legale, Jan 12 '19 at 00:36
@legale If other factors are stronger, then why bother? I mean, if you cannot even measure the difference it cannot be a problem? — klutt, Jan 12 '19 at 00:36
my teory that dereferencing is more expensive, but this is only teory — legale, Jan 12 '19 at 00:39
Well, it's impossible to answer. The compiler may optimize it in whatever way it wants. — klutt, Jan 12 '19 at 00:40
I would guess that without optimization flags you are right. For most cases. For most compilers. For most architectures. Etc. — klutt, Jan 12 '19 at 00:44
@Broman: Testing is not the only way to find out. One can learn how compilers behave, read documentation about processor performance, ask experts, and so on. These methods are ultimately more useful as they provide information about theory of operation that may be generalized and applied to new circumstances. — Eric Postpischil, Jan 12 '19 at 00:49
Testing is also a bad idea because you might learn something that is a quirk of your platform, compiler, optimization settings, or even the specific way you encountered the problem. Learning general principles of how to think about writing code in the first instance is valuable. — David Schwartz, Jan 12 '19 at 00:51
I realize I was a bit unclear. I did not mean that OP should test it and then draw general conclusions from it. — klutt, Jan 12 '19 at 00:57
Is there a good reason for not replacing the last loop with `pointer += counter;` (optionally with `counter = 0;` after that if you really want `counter` zeroed by the end of the loop)? — Jonathan Leffler, Jan 12 '19 at 01:14
use your compiler to output an assembly language file for each condition. Then use the file that describes each of the CPU instruction (and number of CPU cycles) and you can easily determine which is more efficient. However, since most modern CPUs are pipelines and perform a lot of operations in parallel, You still will not have a totally accurate measurement — user3629249, Jan 12 '19 at 02:14
It doesn't matter which is more expensive, since the two code snippets have completely different net effects. The first increments `pointer` until `*pointer` is zero. The second increments `pointer` and decrements `counter` until `counter` is zero. Which means the first doesn't affect `counter` in any way, and the second never dereferences `pointer` (i.e. never examines data pointer to by `pointer`). This question is like asking "Is an apple better than a pear?" - for which the answer is "it depends". — Peter, Jan 12 '19 at 02:46
@Peter: I think the question is supposed to be: which is more efficient: looping over implicit length strings / arrays (searching for a terminator) or explicit length (known count). The question neglected to say that other code in the loop would read the array, though! — Peter Cordes, Jan 12 '19 at 07:19
GCC and clang can't auto-vectorize loops when the trip-count isn't known before entry into the loop, so they can never auto-vectorize loops over implicit-length data. e.g. a `strlen` function. ICC can, BTW. So depending on your use-case and target architecture, explicit length can be vastly more expensive. It's also easier for compilers to unroll with explicit-length data, only checking for loop termination every 4 source iterations, for example. — Peter Cordes, Jan 12 '19 at 07:20

Eric Postpischil · Accepted Answer · 2022-09-25T21:44:29.743

*pointer nominally requires a fetch from memory, and that is generally the most expensive of the operations shown in your code.

If we assume your code is compiled directly to the obvious assembly corresponding to the operations as they are described in C’s abstract machine, with no optimization, modern CPUs for desktop computers are typically capable of executing one loop iteration per cycle, except for the memory access. That is, they can increment a pointer or counter, test its value, and branch, with a throughput of one set of those per cycle.

When these operations are used in real programs, they will usually be dwarfed by the other operations being performed. Compilers are generally so good at optimization that the method used to express the loop iteration and termination has little effect on the performance—optimization will likely produce equivalent code regardless of variations in expression for differences like incrementing a counter versus iterating a pointer to some end value. (This excludes using a pointer to fetch a value from memory for testing. That does raise complications.)

Implicit-length data like a C string defeats gcc and clang's auto-vectorizer, and also mostly defeats loop unrolling. With optimization enabled, there's a big difference between these (if we assume there's something inside the loop that also accesses `*pointer`). It also makes the loop-exit branch harder for the CPU to execute ahead of time (out of order execution) to resolve a possible mispredict while still crunching the data. With a counter, especially in an unrolled loop, it can run ahead of the data processing and hide all most of the cost of the branch miss on the last iteration. — Peter Cordes, Jan 12 '19 at 07:27
There's an interesting question here, if it was asked properly. :/ — Peter Cordes, Jan 12 '19 at 07:29

score 1 · Answer 2 · answered Jan 12 '19 at 00:35

1

If you already happen to know the size, I'd expect it to be faster to iterate for some known number of times rather than having to test a pointer each iteration to know whether or not to loop again.

answered Jan 12 '19 at 00:35

David Schwartz

179,497
17
214
278

What code is more CPU expensive: while(*p) or while(i--)?

2 Answers2