1

I've been trying to learn what exactly jump tables are, and I'm having trouble understanding something. From the many examples I've seen they seem to pretty much boil down to this, or at least this is one version of it:

void func1() {};
void func2() {};
void func3() {};

int main()
{
    void(*jumpTo[3])(void) = { func1, func2, func3 };
    jumpTo[1]();

    return 0;
}

These just appear to be an array of function pointers, that are indexed by a certain value/position. Is it the case then that a jump table is just indexing an array of function pointers? I'm really curious about this because I've seen a lot of people saying that switch statements are often compiled into jump tables as a performance measures. From my understanding, by jumping to a function in this way it involves a pointer dereference and a function call. I thought that both of these weren't that great for performance.

Another answer on this site said that by doing it this way "you are adding a function call overhead that a switch statement doesn't necessarily have." How would a switch that compiles to a jump table avoid function calls?

Also, a highly voted answer here said "A jump table can be either an array of pointers to functions OR an array of machine code jump instructions." How would you jump to machine code instructions instead of dereferencing a pointer? Is this faster?

Is the difference between the two that in my above example the pointer doesn't have to be dereferenced because it can be statically bound? As opposed to passing in a random number as index at runtime?

Thanks.

Zebrafish
  • 11,682
  • 3
  • 43
  • 119
  • 2
    Consider the `goto` and that a compiler can introduce targets for it however it wants. (A jump table that the compiler creates isn't necessarily implementable in the language itself.) – molbdnilo Jul 22 '17 at 11:10
  • 1
    You can take a look at some assembly to understand, see e.g. [this](https://godbolt.org/g/DA32w3) - both `f` and `g` performs the same task, but `f` uses a switch which is converted into a jump table while `g` uses a sequence of `if` which is simply converted into a sequence of cmp / jmp instruction. – Holt Jul 22 '17 at 11:19
  • Also, you should add link to the questions / answers you are talking about, this would add context. – Holt Jul 22 '17 at 11:21
  • 1
    In your case, it is a call. What the compiler would probably use is a jump (that is, it goes at another location in the same function). In practice, you rarely need to worry about such thing except if that code is called millions of time in a tight loop. – Phil1970 Jul 22 '17 at 11:34
  • I would think one of the differences might be that in your example above it would be possible to define the functions jumped to at run time which would make it impossible for the compiler to inline the call. by inlining the compiler could avoid the function dereference? – William Jones Jul 22 '17 at 13:15
  • @William Jones That's kind of what I'm wondering, in the usual case of how it's normally used, you usually index into the array from a value at runtime, it would always have to dereference the pointer and call, right? – Zebrafish Jul 22 '17 at 14:05

3 Answers3

3

A jump table and your function table are basically the same - an array of addresses. A jump table contains addresses of goto - targets. The only difference between both is how the jump is made. When a function is called the return address is pushed on the stack, so when the function terminates it can return.

Here an example of a jump table:

#include <stdio.h>
int main(int argc, char *argv[])
{
    switch (argc)
    {
        case 1: 
             printf("You provided no arguments.");
             break;
        case 2: 
             printf("You provided one argument.");
             break;
        case 3: 
             printf("You provided two arguments.");
             break;
        case 4: 
             printf("You provided three arguments.");
             break;
        case 5: 
             printf("You provided four arguments.");
             break;
        case 6: 
             printf("You provided five arguments.");
             break;
        default: 
             printf("You provided %d arguments.", argc-1);
             break;
    }
    return 0;
}

This compiles to:

    cmp edi, 6 ;Bounds check
    ja  .L2    ;jump to default branch
    mov eax, edi
    jmp [QWORD PTR .L4[0+rax*8]]
.L4:
    .quad   .L2 ;case 0 (same as default!!!)
    .quad   .L3 ;case 1
    .quad   .L5 ;case 2
    .quad   .L6 ;case 3
    .quad   .L7 ;case 4
    .quad   .L8 ;case 5
    .quad   .L9 ;case 6
user5329483
  • 1,260
  • 7
  • 11
  • Would it make a difference if the instructions in each case were of different length? If they were the same length then the computer could just multiply the index by this known length to increment ahead. If they weren't the same length then it could still avoid a function call, but it would have to look up the address of the appropriate instructions based on the index. In that case the only difference between the pointer array and this faster jump table would be that it's not making a function call? – Zebrafish Jul 22 '17 at 14:44
  • different length: No, it doesn't matter. In my example .L2 may be the address 0x1000, .L3 is 0x1010 and .L3 may be 0x2000. The array contains nothing but addresses, so every array element has the same size. Identical code branch sizes: It is unlikely that this happens. Extra padding to bring each branch to the same size would be more expensive than this address table. Please note: A function call means not only the call overhead, it also includes the change of scope. So all local data available of the caller are not available to the callee, unless passed as parameter! – user5329483 Jul 22 '17 at 15:07
  • By the way, a compiler introduces a jump table only if this saves space and/or execution time. I had to convince my compiler to use a jump table by having six instead three `case`s. All choices of the switch() should be continuous. Please follow the assembler code when argc is zero! – user5329483 Jul 22 '17 at 15:16
  • Right, so I imagine it's very close to something like an "array" of gotos. The compiler isn't obliged to actually do this, is there a way to implement this yourself out of curiosity? I know it sounds silly to basically have a list of gotos that are indexed by a number. – Zebrafish Jul 22 '17 at 15:30
  • I think it is correct that C / C++ doesn't allow something like `goto myArray[i];`. Anyway, using goto is subject of HEAVY discussions, close to war. I still use goto, but very very rarely. – user5329483 Jul 22 '17 at 15:44
1

The main difference is that for a jump table, you can typically use addressing relative to the program counter, so that the table will not need any relocations and can live in the .text section (or some other section which is non-writable and shared). This is because a typical jump table is used only at very few places within the same object files, and all the offsets are known to the assembler.

If you have an array of function pointers instead, then you somehow need to produce real pointers, and that needs some form of relocation.

The second possibility, the array of jump instructions, is not really restricted to jump instructions per se. The important part is that all the target instruction sequences (except the last one) are of the same length, so that the offset to jump to can be computed easily. This way, no jump table it is needed, but it does need exact instruction width (and count) information, which is difficult to guarantee on most targets (RISC architectures can have difficult-to-predict effective instruction counts when it comes to loading constants). This means that in practice, this approach is restricted to a very specific form of jump instruction for the targets.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • So you're saying that a switch statement could only be compiled to a proper jump table if each instruction length in each case is the same length? Couldn't it just keep a list of the index matching to the addresses of where to jump to? Then the only difference would be instead of calling a function it would jump to an address within that section of code. But it would still have to do the equivalent of dereferencing the pointer, when it gets the value of the index it would need to look up the address of where to jump? – Zebrafish Jul 22 '17 at 14:25
  • Right, the index approach is the first approach I described. You still need a lookup table to deal with the non-constant instruction sequence lengths, though. That's why directly computing the target address for the instruction sequence is desirable, but that's not always possible. – Florian Weimer Jul 22 '17 at 14:29
1

Generally the term jump table refers to a technique wherein there are more than 2 branches/jump targets, and the jump/branch target is chosen by a variable by calculating the position in the table, one way or the another. In essence, the example you provided:

void(*jumpTo[3])(void) = { func1, func2, func3 };
jumpTo[1]();

as a whole is the use of a jump table - not just the dereference of the function pointer.

C offers other mechanisms too - for example a switch-case is often compiled into a jump table, especially if the case values have a narrow range, and have few gaps in between. Another mechanism provided by GCC as a non-standard extension is the use of goto labels as pointer values with computed goto.

  • @Antii Haapala So I assume that if a switch statement is compiled to a jump table, it would be different from an array of function pointers? So I'm thinking that the difference is that the faster jump table jumps to a particular position within the same section of code instead of calling a function, which is probably located farther away. I think it would still have to do the equivalent of the pointer dereference, that is, once it has a number or index, it has to match that to the corresponding address of where to jump to. – Zebrafish Jul 22 '17 at 14:19
  • 1
    yes, and another thing is that it doesn't need to setup the call frame, etc. It is more like the computed goto. – Antti Haapala -- Слава Україні Jul 22 '17 at 14:22