-4

I want to write a function that when called doubles its argument (if it is non-zero) or returns a specific constant (if it is zero). The constant is always a power of 2 if that helps.

Let's say the constant is 8. When called with 0, I want it to return 8. When called with 8, I want it to return 16. And so on.

The trivial approach would be something like:

unsigned foo(unsigned value)
{
    return (value ? value * 2 : 8);
}

Is it possible to do this without branching?

martinkunev
  • 1,364
  • 18
  • 39
  • What do you mean by branching? –  Oct 09 '15 at 07:48
  • The ones who downvoted, please explain why. – martinkunev Oct 09 '15 at 08:04
  • 3
    I didn't down-vote, but typically this happens when the originator of the question has not shown any evidence of prior effort, e.g. research or attempted solution. – Paul R Oct 09 '15 at 08:05
  • @thebigbo - I mean CPU branching (instructions that may cause jumps). – martinkunev Oct 09 '15 at 08:05
  • 3
    I didn't downvote too, but your question too me a while after reading the answers to understand. You must include at least what you have tried – phuclv Oct 09 '15 at 08:06
  • 2
    As you can see from your answers there are a couple of way to do this, but they are all kind of horrible, in that they are just about unreadable, very hard to understand without any kind of context of comments, and unmaintainable. Don't use hacks like that in any kind of production code. – Some programmer dude Oct 09 '15 at 08:09
  • 2
    @JoachimPileborg There is always a tradeoff between performance and something else (e.g. readability, maintainability and etc.). And I do think that is where the comments are needed. So my opinion is that never fear to use the effective code and remember to write comments for better readability. – mkvoya Oct 09 '15 at 12:36

3 Answers3

4

This causes no additional memory access.

int f(int a)
{
    const int c = 8;
    return (a*2)+(a==0)*c;
}
mkvoya
  • 156
  • 1
  • 4
  • True, although - to be honest, the calculation will still cause data dependency that delays the result. The benefit of no branching here is that you can get multiple sequential calls to perform in parallel without fear of flushing the pipe (note that the function call itself is also a form of branch, though easy to predict) – Leeor Oct 09 '15 at 11:44
  • @Leeor Would you please point out where is that data dependency and how can it delay the result? And why will there be a pipeline flush? (as far as I can see, the branching caused by the function call itself is never predictable before the call instruction is decoded. And that is not a prediction since they never cause a mis-prediction) – mkvoya Oct 09 '15 at 12:48
  • obviously the return value depends on a, so the delay caused by the branch resolution is still there, only the chance of flushing the pipe on misprediction is gone, allowing the execution to run ahead with further operations. The call itself will likely be simple (if the function isn't called via call tables somehow), but the return requires prediction (usually done by a return stack buffer). – Leeor Oct 09 '15 at 13:38
  • 1
    @Leeor I'm sorry but if a function's return value does not depend on the parameter...... are you sure that you need that parameter? What's more, the delay you mentioned is caused by the function call mechanism itself. If you really care about that, use inlined function or macro instead. And `call` and `ret` have nothing to do with **prediction**, since the cpu won't try to predict where is the next instruction (that's too hard). `call` and `ret` might cause pipeline stalls and that's indeed a flaw, but there won't be any pipeline flush. – mkvoya Oct 09 '15 at 13:53
  • leave aside the function call, let's say it's inlined. I'm saying that avoiding the branch doesn't save the calculation latency, only the misprediction penalty of flushing the pipe. If the branch is unpredictable, then it's a good practice. This is a good answer. – Leeor Oct 09 '15 at 14:01
  • @Leeor Yes, and actually there might be more calculation than using branching. It occurs to me that there are conditional move instructions `cmovxx` which can do the store only if the condition is met. They are single instructions so the cpu will stall the pipeline until the condition flag is available and thus no prediction will be performed. The corresponding C code should be `a==0?c:(a*2)` but I think it also depends on the compiler's behaviour. – mkvoya Oct 09 '15 at 14:13
3
static int myconst[2] = { 8, 0 };
int f(int x)
{
    return x + x + myconst[!!x];
}
Matt
  • 13,674
  • 1
  • 18
  • 27
3

Using mainly bitwise operators:

int foo(int n)
{
     const int d = 8;            // default return value for n == 0
     int mask = (n == 0) - 1;    // generate mask = 0 if n == 0, otherwise all 1s
     return ((n << 1) & mask) | (d & ~mask);
}

Let's test it:

#include <stdio.h>

static int foo(int n)
{
     const int d = 8;            // default return value for n == 0
     int mask = (n == 0) - 1;    // generate mask = 0 if n == 0, otherwise all 1s
     return ((n << 1) & mask) | (d & ~mask);
}

int main()
{
    const int tests[] = { 8, 1, 0, -1, -8 };

    for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i)
    {
        printf("%4d -> %4d\n", tests[i], foo(tests[i]));
    }
    return 0;
}

Compile and run:

$ gcc -Wall double_fun.c && ./a.out
   8 ->   16
   1 ->    2
   0 ->    8
  -1 ->   -2
  -8 ->  -16
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • @LưuVĩnhPhúc: sure - that works too, but note that it inverts the logic, so the following line would need to be changed. – Paul R Oct 09 '15 at 08:12
  • It would also seem that a simple `(n<<1) + !n * d` would suffice? – Lundin Oct 09 '15 at 08:39
  • 2
    @Lundin: yes, although I was deliberately avoiding multiplication, in case this is something like a low-end embedded system where multiplies might be expensive. – Paul R Oct 09 '15 at 09:20
  • @PaulR But in such systems, you wouldn't care much about branch prediction, but write the code in plain, non-obfuscated C. It is also quite possible that the compiler is smart enough to realize that `!n * d` can only result in two different values: 0 or `d` and then replace the multiplication with load/store instructions. – Lundin Oct 09 '15 at 09:25
  • @Lundin: true - I think the whole premise of the question is probably bogus - I was just going for what I felt was an efficient lightweight solution using instructions which are typically single cycle on most CPUs. – Paul R Oct 09 '15 at 09:28