Is it faster in C to use a written jump table or switch statement?

Question

So, I am trying to see if there is any difference between using a jump table of function pointers versus a switch statements for performing many, one command operations like these.

This is the code to assembly link i made

Here is my actual code as well

enum code {
    ADD,
    SUB,
    MUL,
    DIV,
    REM
};

typedef struct {
    int val;
} Value;


typedef struct {
    enum code ins;
    int operand;
} Op;


void run(Value* arg, Op* func)
{
   switch(func->ins)
   {
     case ADD: arg->val += func->operand; break;
     case SUB: arg->val -= func->operand; break;
     case MUL: arg->val *= func->operand; break;
     case DIV: arg->val /= func->operand; break;
     case REM: arg->val %= func->operand; break;
   }
}

My question is, based on the generated assembly in that link or the code, would there be any difference from making a bunch of small functions to complete the operations in the cases of the switch statement, and making an array of pointers to those functions and calling them with the same enum?

Using gcc x86_64 7.1

void add(Value* arg, Op* func)
{
   arg->val += func->operand;
}

static void (*jmptable)(Value*, Op*)[] = {
     &add
}

Assembly code paste:

run(Value*, Op*):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     QWORD PTR [rbp-16], rsi
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax]
        cmp     eax, 4
        ja      .L9
        mov     eax, eax
        mov     rax, QWORD PTR .L4[0+rax*8]
        jmp     rax
.L4:
        .quad   .L3
        .quad   .L5
        .quad   .L6
        .quad   .L7
        .quad   .L8
.L3:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        add     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L5:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        sub     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L6:
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax+4]
        imul    edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L7:
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     rdx, QWORD PTR [rbp-16]
        mov     esi, DWORD PTR [rdx+4]
        cdq
        idiv    esi
        mov     edx, eax
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        jmp     .L2
.L8:
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     rdx, QWORD PTR [rbp-16]
        mov     ecx, DWORD PTR [rdx+4]
        cdq
        idiv    ecx
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        nop
.L2:
.L9:
        nop
        pop     rbp
        ret

too broad, you didnt specify the architecture, compiler, environment, but even with that information it is still too broad as one set of results on one test on one computer with one compiler is only valid in that scope — old_timer, Jul 12 '17 at 22:39
the question is labeled with gcc and the link to the generated assembly code specifies it as x86_64 — Josh Weinstein, Jul 12 '17 at 22:40
`case ADD: arg->val += func->operand;` you should *at least* add breaks to your switch. (that will make it faster, too ...) — wildplasser, Jul 12 '17 at 22:43
It's already a jump table, you could add function call overhead to that if you want.. — harold, Jul 12 '17 at 22:44
@harold to clarify you are saying my assembly code already has a jump table? This was more of the basis of my question — Josh Weinstein, Jul 12 '17 at 22:46
Yes you see that row of `.quad label`? That's the jump table. The `jmp rax` above it is the instruction that jumps to some label that it got out of the table. — harold, Jul 12 '17 at 22:49
gcc and x86_64 is not remotely enough information and is still very limited in scope as to the answer, shouldnt be too hard to create a situation with one faster than the other. one wins one time the other wins another... — old_timer, Jul 12 '17 at 23:16
note links have little value here if you want to use the output then paste the output in the question. — old_timer, Jul 12 '17 at 23:16

Petr Skocik · Accepted Answer · 2017-07-12T22:53:42.507

A catchall answer to all these questions: you should measure.

Practically though, I'm betting on the switch version. Function calls have overhead (and they can be hardly inlined in this context), which you could eliminate with labels as values, which is a common compiler extension*, but you should really try all your options and measure if the performance of this piece of code matters to you greatly.

Otherwise, use whatever's most convenient to you.

*a switch is likely to generate a jump table equivalent to what you could compose from labels as values but it could switch between different implementations depending on the particular case values and their number

0___________ · Answer 2 · 2017-07-12T23:14:36.463

4

Can you spot the difference? Trust in compiler (it will do such a micro optimisations much better than you) - and do not forget break statements. Care about algorithm, not about such a small details.

https://godbolt.org/g/sPxse2

edited Jul 12 '17 at 23:14

answered Jul 12 '17 at 22:52

0___________

60,014
4
34
74

score 0 · Answer 3 · answered Aug 21 '22 at 19:19

Looks like due to branch prediction and bounds checking, using the switch labels as jump points may be up to 20% faster on older systems - newer systems having better branch prediction. Basically, this relies on a compiler extension. You still have the switch, but the switch doesn't fall through to the dispatcher. Instead, each case has its own dispatcher that jumps directly into the case. A number of popular VMs do this.

See here for more info and examples:https://www.cipht.net/2017/10/03/are-jump-tables-always-fastest.html

Is it faster in C to use a written jump table or switch statement?

3 Answers3