-8

What C code is more CPU expensive:

while(*pointer){ 
    pointer++; 
} 

or

while(counter > 0){ 
    pointer++; 
    counter--; 
}

?

legale
  • 612
  • 7
  • 9
  • 5
    Only one way to find out. Test it with your compiler on your cpu. – klutt Jan 12 '19 at 00:21
  • 2
    Looks like both code have no side effects, thus can be optimized into a no operation. Anyway, generally speaking, follow [rules of optimization](http://wiki.c2.com/?RulesOfOptimization). – KamilCuk Jan 12 '19 at 00:25
  • 3
    Do you mind adding some context (a [mcve] would be nice) to those unrelated snippets? Is there any relation between whatever is pointed by `pointer` and `count`? – Bob__ Jan 12 '19 at 00:29
  • Looks like they both do different things? – Neil Jan 12 '19 at 00:31
  • i don't know how to do it. exec time difference is so small that other factors are stronger. – legale Jan 12 '19 at 00:31
  • char const unsigned *p = string.val; /* string cursor pointer */ for (size_t n = string.len; n > 0; --n) { /* cycle each byte with callback function */ (*filter->filter_function)(*p++, filter); } – legale Jan 12 '19 at 00:36
  • char const unsigned *p = string.val; /* string cursor pointer */ while (*p) { /* cycle each byte with callback function */ (*filter->filter_function)(*p++, filter); } – legale Jan 12 '19 at 00:36
  • this is context – legale Jan 12 '19 at 00:36
  • @legale If other factors are stronger, then why bother? I mean, if you cannot even measure the difference it cannot be a problem? – klutt Jan 12 '19 at 00:36
  • there may be no practical use, but i need to know. – legale Jan 12 '19 at 00:37
  • Why do you need to know? – klutt Jan 12 '19 at 00:38
  • my teory that dereferencing is more expensive, but this is only teory – legale Jan 12 '19 at 00:39
  • Well, it's impossible to answer. The compiler may optimize it in whatever way it wants. – klutt Jan 12 '19 at 00:40
  • I would guess that without optimization flags you are right. For most cases. For most compilers. For most architectures. Etc. – klutt Jan 12 '19 at 00:44
  • 2
    @Broman: Testing is not the only way to find out. One can learn how compilers behave, read documentation about processor performance, ask experts, and so on. These methods are ultimately more useful as they provide information about theory of operation that may be generalized and applied to new circumstances. – Eric Postpischil Jan 12 '19 at 00:49
  • 1
    Testing is also a bad idea because you might learn something that is a quirk of your platform, compiler, optimization settings, or even the specific way you encountered the problem. Learning general principles of how to think about writing code in the first instance is valuable. – David Schwartz Jan 12 '19 at 00:51
  • I realize I was a bit unclear. I did not mean that OP should test it and then draw general conclusions from it. – klutt Jan 12 '19 at 00:57
  • 1
    Is there a good reason for not replacing the last loop with `pointer += counter;` (optionally with `counter = 0;` after that if you really want `counter` zeroed by the end of the loop)? – Jonathan Leffler Jan 12 '19 at 01:14
  • use your compiler to output an assembly language file for each condition. Then use the file that describes each of the CPU instruction (and number of CPU cycles) and you can easily determine which is more efficient. However, since most modern CPUs are pipelines and perform a lot of operations in parallel, You still will not have a totally accurate measurement – user3629249 Jan 12 '19 at 02:14
  • It doesn't matter which is more expensive, since the two code snippets have completely different net effects. The first increments `pointer` until `*pointer` is zero. The second increments `pointer` and decrements `counter` until `counter` is zero. Which means the first doesn't affect `counter` in any way, and the second never dereferences `pointer` (i.e. never examines data pointer to by `pointer`). This question is like asking "Is an apple better than a pear?" - for which the answer is "it depends". – Peter Jan 12 '19 at 02:46
  • @Peter: I think the question is supposed to be: which is more efficient: looping over implicit length strings / arrays (searching for a terminator) or explicit length (known count). The question neglected to say that other code in the loop would read the array, though! – Peter Cordes Jan 12 '19 at 07:19
  • GCC and clang can't auto-vectorize loops when the trip-count isn't known before entry into the loop, so they can never auto-vectorize loops over implicit-length data. e.g. a `strlen` function. ICC can, BTW. So depending on your use-case and target architecture, explicit length can be vastly more expensive. It's also easier for compilers to unroll with explicit-length data, only checking for loop termination every 4 source iterations, for example. – Peter Cordes Jan 12 '19 at 07:20

2 Answers2

2

*pointer nominally requires a fetch from memory, and that is generally the most expensive of the operations shown in your code.

If we assume your code is compiled directly to the obvious assembly corresponding to the operations as they are described in C’s abstract machine, with no optimization, modern CPUs for desktop computers are typically capable of executing one loop iteration per cycle, except for the memory access. That is, they can increment a pointer or counter, test its value, and branch, with a throughput of one set of those per cycle.

When these operations are used in real programs, they will usually be dwarfed by the other operations being performed. Compilers are generally so good at optimization that the method used to express the loop iteration and termination has little effect on the performance—optimization will likely produce equivalent code regardless of variations in expression for differences like incrementing a counter versus iterating a pointer to some end value. (This excludes using a pointer to fetch a value from memory for testing. That does raise complications.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Implicit-length data like a C string defeats gcc and clang's auto-vectorizer, and also mostly defeats loop unrolling. With optimization enabled, there's a big difference between these (if we assume there's something inside the loop that also accesses `*pointer`). It also makes the loop-exit branch harder for the CPU to execute ahead of time (out of order execution) to resolve a possible mispredict while still crunching the data. With a counter, especially in an unrolled loop, it can run ahead of the data processing and hide all most of the cost of the branch miss on the last iteration. – Peter Cordes Jan 12 '19 at 07:27
  • There's an interesting question here, if it was asked properly. :/ – Peter Cordes Jan 12 '19 at 07:29
1

If you already happen to know the size, I'd expect it to be faster to iterate for some known number of times rather than having to test a pointer each iteration to know whether or not to loop again.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278