10

I read a little about hoisting and reordering, so it seems that Java VM may choose to hoist some expressions. I also read about hoisting of function declarations in Javascript.

First Question: Can someone confirm if hoisting usually exist in C, C++ and Java? or are they all compiler/optimization dependent?

I read a lot of example C codes that always put variable declarations on top, before any assert or boundary condition. I thought it would be a little faster to do all the asserts and boundary cases before variable declarations given that the function could just terminate.

Main Question: Must variable declarations always be on top in a context? (is there hoisting at work here?) Or does the compiler automatically optimize the code by checking these independent asserts and boundary cases first (before irrelevant variable declaration)?

Here's a related example:

void MergeSort(struct node** headRef) {
    struct node* a;
    struct node* b;
    if ((*headRef == NULL) || ((*headRef)->next == NULL)) {
        return;
    }
    FrontBackSplit(*headRef, &a, &b);
    MergeSort(&a);
    MergeSort(&b);
    *headRef = SortedMerge(a, b);
}

As shown above, the boundary case does not depend on variables "a" and "b". Thus, putting the boundary case above variable declarations would make it slightly faster?


Updates:

The above example isn't as good as I hoped because variables "a" and "b" were only declared, not initialized there. Compiler would ignore declaration until we actually need to use them.

I checked GNU GCC assemblies for variable declarations with initializations, the assemblies have different execution sequence. Compiler did not change my ordering of independent asserts and boundary cases. So, reordering these asserts and boundary cases do change the assemblies, thus changing how machine runs them.

I suppose the difference is minuscule that most people never cared about this.

Community
  • 1
  • 1
Night0
  • 347
  • 5
  • 13
  • 2
    Hoisting does not exist in C and C++. Declaration must precede use. Initialization occurs when the program reaches the line on which a variable is declared---no sooner and no later. – Brian Bi Mar 21 '14 at 00:52
  • Thanks Brian. So, should I put independent *assert* and *boundary cases* at the top (above declaration)? I figured this would be a little faster? – Night0 Mar 21 '14 at 00:54
  • @Night0 is it really faster? Have you checked that? If you enable compiler optimization (what you should always do) I doubt that the compiler will generate different code for both versions of your function. – ciamej Mar 21 '14 at 01:15
  • @ciamej: I checked GNU GCC assemblies, for *variable declarations with initializations*, the assemblies are different. So, putting *assert* and *boundary cases* above initialization do change the assemblies, thus affect the machine run. However, for *variable declarations* only (without initialization), assemblies are the same because the assembly only assign it when it's being used. – Night0 Mar 21 '14 at 05:07
  • That's interesting. Did you compile with -O2? and what values did you initialize the variables with? Where they constants like `= 0;` or something more complex? – ciamej Mar 21 '14 at 12:36
  • I checked that myself and it seems that the compiler has no interest in moving the initializations anywhere around. Though it is allowed to do so. In this example the compiler can't predict how often the condition in if statement will be true so it just sticks to the order of instructions defined by the programmer. This might be different with Just-in-time compilation as in Java hotspot or c++ LLVM. – ciamej Mar 21 '14 at 13:12
  • @ciamej: Yeah, with or without optimization, C compiler never moves these declarations with initialization (I set them to NULL). Perhaps the difference of reordering is small enough that people didn't care about this. – Night0 Mar 21 '14 at 18:05

3 Answers3

7

The compiler may reorder/modify your code as it wishes, as long as the modified code is equivalent to the original if executed sequentially. So hoisting is allowed, but not required. This is an optimization and it is completely compiler specific.

Variable declarations in C++ can be wherever you wish. In C they used to have to be on top in a context, but when the c99 standard was introduced, the rules were relaxed and now they can be wherever you want, similarly to c++. Still, many c programmers stick to putting them on top in a context.

In your example, the compiler is free to move the if statements to the top, but I don't think it would. These variables are just pointers that are declared on stack and are un-initialized, the cost of declaring them is minimal, moreover it might be more efficient to create them at the beginning of the function, rather than after the asserts.

If your declarations would involve any side-effects, for example

struct node *a = some_function();

then compiler would be limited in what it can reorder.

Edit:

I checked GCC's loop hoisting in practice with this short program:

#include <stdio.h>
int main(int argc, char **argv) {
    int dummy = 2 * argc;
    int i = 1;
    while (i<=10 && dummy != 4)
        printf("%d\n", i++);
    return 0;
}

I've compiled it with this command:

gcc -std=c99 -pedantic test.c -S -o test.asm

This is the output:

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "%d\12\0"
    .text
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB7:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    call    ___main
    movl    8(%ebp), %eax
    addl    %eax, %eax
    movl    %eax, 24(%esp)
    movl    $1, 28(%esp)
    jmp L2
L4:
    movl    28(%esp), %eax
    leal    1(%eax), %edx
    movl    %edx, 28(%esp)
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
L2:
    cmpl    $10, 28(%esp)
    jg  L3
    cmpl    $4, 24(%esp)
    jne L4
L3:
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE7:
    .ident  "GCC: (GNU) 4.8.2"
    .def    _printf;    .scl    2;  .type   32; .endef

Then I've compiled it with this command:

gcc -std=c99 -pedantic test.c -O3 -S -o test.asm

This is the output:

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "%d\12\0"
    .section    .text.startup,"x"
    .p2align 4,,15
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB7:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    pushl   %ebx
    andl    $-16, %esp
    subl    $16, %esp
    .cfi_offset 3, -12
    call    ___main
    movl    8(%ebp), %eax
    leal    (%eax,%eax), %edx
    movl    $1, %eax
    cmpl    $4, %edx
    jne L8
    jmp L6
    .p2align 4,,7
L12:
    movl    %ebx, %eax
L8:
    leal    1(%eax), %ebx
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
    cmpl    $11, %ebx
    jne L12
L6:
    xorl    %eax, %eax
    movl    -4(%ebp), %ebx
    leave
    .cfi_restore 5
    .cfi_restore 3
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE7:
    .ident  "GCC: (GNU) 4.8.2"
    .def    _printf;    .scl    2;  .type   32; .endef

So basically, with optimization turned on the original code was transformed to something like this:

#include <stdio.h>
int main(int argc, char **argv) {
    int dummy = 2 * argc;
    int i = 1;
    if (dummy != 4)
        while (i<=10)
            printf("%d\n", i++);
    return 0;
}

So, as you can see, there is indeed hoisting in C.

ciamej
  • 6,918
  • 2
  • 29
  • 39
1

Actually concept of hoisting in java exists. Code:

while (!stop)

        i++;

Might be converted into this code:

if (!stop)

    while (true)

        i++;

JVM does (allows) this "optimization" when there is no synchronization block on the given method.

More details can be found at Effective Java, 3rd Edition , chapter 11, concurrency

MagGGG
  • 19,198
  • 2
  • 29
  • 30
-2

Hoisting does not exist in C, C++, Java.

Variable declaration can occur at any point within a method or function for C++ and Java but it must be before the value is used. For C it must be at the top.

Variable scope in these languages is either global or wherever the curly braces are used (so you can arbitrarily throw a pair of curly braces into a C program and introduce a new variable scope - in Javascript you would achieve the same thing using a closure)

Kaffiene
  • 705
  • 4
  • 12