Do mainstream compilers convert passed-by-reference basic types into pass-by-copy?

Question

Passing an object by reference is an easier, faster and safer way to pass an address to it. But for most compilers, it's all the same: references are really pointers.

Now what about basic types like int? Passing an address to an int and using it inside a function would be slower than passing it by copy, because the pointer needs to be dereferenced before use.

How do modern compiler handle, this?

int foo(const int & i)
{
   cout << i; // Do whatever read-only with i.
}

May I trust them to compile this into this?

int foo(const int i)
{
   cout << i;
}

By the way, in some cases it could even be faster to pass both i and &i, then use i for reading, and *i for writing.

int foo(const int i, int * ptr_i)
{
   cout << i;    // no dereferencement, therefore faster (?)
   // many more read-only operations with i.
   *ptr_i = 123;
}

At the very best this could happen only for static functions. As soon as you have multiple TUs, you would have no way of controlling that both TUs implement the same interface. — Kerrek SB, Oct 31 '11 at 18:54

Alok Save · Answer 1 · 2011-10-31T19:21:40.833

5

May I trust them to compile this into this?
Yes You can.[The Yes here means differently, Please read Edit section, Which clarify's]

int foo(const int & i)

Tells the compiler that i is an reference to type constant integer.
The compiler may perform optimizations but they are only allowed to perform optimizations following the As-If Rule. So you can be assured that for your program the behavior of the above will be as good as(the const qualifier will be respected):

int foo(const int i)

As-If Rule:

The C++ standard allows a compiler to perform any optimization, as long as the resulting executable exhibits the same observable behaviour as if all the requirements of the standard have been fulfilled.

For Standerdese fans:
C++03 1.9 "Program execution:

conforming implementations are required to emulate (only) the observable behavior of the abstract machine.

And the Foot-Note says:

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

EDIT:
Since there is some confusion about the answer let me clarify:
Optimizations cannot be enforced on the compiler.So How compiler interprets it depends on the compiler.The important thing is the observable behavior of the program will not change.

edited Oct 31 '11 at 19:21

answered Oct 31 '11 at 18:54

Alok Save

202,538
53
430
533

I was waiting for a quote of the standard. There you are! Thx. – Gabriel Oct 31 '11 at 19:06
@Gabriel: That was not the quote from the Standard actually,but Anyways since you are a standerdese fan I added one from the Standard,Which implies the same as the simplified version of **As-If** rule. – Alok Save Oct 31 '11 at 19:12
1

Hmm... but does this say anything about whether such an optimisation does actually ever happen in practice? – Kerrek SB Oct 31 '11 at 19:13
@KerrekSB: Depends on the compiler.Optimizations cannot be enforced on the compiler.My answer says that the observable behavior of the program will not change.To quote `(the const qualifier will be respected)` – Alok Save Oct 31 '11 at 19:17
Your answer begins "May I trust them to compile this into this? Yes You can." But your comment admits that you can not! – Porculus Oct 31 '11 at 19:18
@Als: Well, I know the general rules for what a compiler is allowed to do, but the specific situation here is very interesting, and I'd be keen to know if this precise optimization is in fact ever possible, or observed in the wild. As I said in my other comment, I have a feeling that it cannot possibly be done for exported functions... – Kerrek SB Oct 31 '11 at 19:21
@KerrekSB: I would be as interested to know as you,the fact remains unless proved otherwise the compiler is allowed to optimize as they choose. – Alok Save Oct 31 '11 at 19:23
@Als: I'm trying to rig up an example and compare the assembly, but I'm struggling. Anything too simple and the compiler will just inline the whole thing... also I'm not sure I can tell whether something small is passed by reference or by value from the assembly, shame on me. Something about `8(%esp)` I suppose? – Kerrek SB Oct 31 '11 at 19:25
@Als Good answer, although I'd really like to know if any compiler actually do it. – Gabriel Oct 31 '11 at 20:02

score 4 · Answer 2 · answered Nov 01 '11 at 00:44

It shouldn't compile it into that because it might not be correct. Consider:

int foo(const int &i, int *p)
{
   *p = 42;
   cout << i; // prints 42
   return 0;
}

int main()
{
   int x = 5;
   foo(x, &x);
   return 0;
}

versus

int foo(const int i, int *p)
{
   *p = 42;
   cout << i; // prints 5
   return 0;
}

int main()
{
   int x = 5;
   foo(x, &x);
   return 0;
}

How does the compiler know that this won't happen? It would have to somehow be able to analyze that it is impossible to access that variable to change it, e.g. (1) someone having a pointer, (2) it might be a global variable, (3) from another thread. Given the unsafe nature of C, with pointer arithmetic and all, even guaranteeing that the function won't be able to get a pointer to the variable might be impossible.

And when `foo` has only one parameter, a global `int *p` could still point to i. What if there are no pointer dereferencements inside `foo`? — Gabriel, Nov 01 '11 at 10:52

score 1 · Accepted Answer · answered Nov 02 '11 at 11:45

Visual Studio 2010 (Express) does, in the simple cases I've tested at least. Anyone to test gcc?

I've tested the following:

1. Passing only i:

int vars[] = {1,2,3,12,3,23,1,213,231,1,21,12,213,21321,213,123213,213123};

int ok1(const int i){
    return sqrtl(vars[i]);
}

int ok2(const int & i){
    return sqrtl(vars[i]);
}

void main() {
    int i;
    std::cin >> i;
    //i = ok1(i);
    i = ok2(i);
    std::cout << i;
}

The ASM:

i = ok1(i);
000D1014  mov         ecx,dword ptr [i]  
000D1017  fild        dword ptr vars (0D3018h)[ecx*4]  
000D101E  call        _CIsqrt (0D1830h)  
000D1023  call        _ftol2_sse (0D1840h) 

i = ok2(i);
013A1014  mov         ecx,dword ptr [i]  
013A1017  fild        dword ptr vars (13A3018h)[ecx*4]  
013A101E  call        _CIsqrt (13A1830h)  
013A1023  call        _ftol2_sse (13A1840h)

Well, the ASMs are identical, no doubt the optimization was performed.

2. Passing i and &i:

Let's consider @newacct 's anser here.

int vars[] = {1,2,3,12,3,23,1,213,231,1,21,12,213,21321,213,123213,213123};

int ok1(const int i, int * pi) {
    *pi = 2;
    return sqrtl(vars[i]);
}

int ok2(const int & i, int * pi) {
    *pi = 2;
    return sqrtl(vars[i]);
}

void main() {
    int i;
    int * pi = &i;
    std::cin >> i;
    i = ok1(i, pi);
    //i = ok2(i, pi);
    std::cout << i;
}

The ASM:

i = ok1(i, pi);
00891014  mov         ecx,dword ptr [i]
00891017  fild        dword ptr vars (893018h)[ecx*4] // access vars[i] 
0089101E  call        _CIsqrt (891830h)  
00891023  call        _ftol2_sse (891840h)  

i = ok2(i, pi);
011B1014  fild        dword ptr [vars+8 (11B3020h)]   // access vars[2]
011B101A  call        _CIsqrt (11B1830h)  
011B101F  call        _ftol2_sse (11B1840h)

In ok1 I can't see it writing 2 into pi. Probably it understands that the memory location will be overwritten by the result of the function anyway, so the writing is useless.

With ok2, the compiler is as smart-ass as I expected. It understands that i and pi point to the same place, so it uses a hardcoded 2 directly.

Notes:

I've compiled twice for both test, once uncommenting only ok1, once uncommenting only ok2. Compiling both at the same time leads to more complex optimizations between the two functions, which end up all inlined and mixed up
I've added a lookup in the array vars because simple calls to sqrtl were simplified into basic ADD- and MUL-like operations without the actual call
Compiled in Release
Yielded the expected results, of course

score 0 · Answer 4 · answered Aug 12 '13 at 18:08

gcc does not appear to do this optimization with -O3 (gcc version 4.7.2). Using Gabriel's code, note how ok2 loads a dereferenced address before indexing into vars while ok1 does not.

ok1:


    .cfi_startproc
    subq    $40, %rsp
    .cfi_def_cfa_offset 48
    movslq  %edi, %rdi
    fildl   vars(,%rdi,4)
    fld %st(0)
    fsqrt
    fucomi  %st(0), %st
    jp  .L7
    fstp    %st(1)

ok2:


    .cfi_startproc
    subq    $40, %rsp
    .cfi_def_cfa_offset 48
    movslq  (%rdi), %rax
    fildl   vars(,%rax,4)
    fld %st(0)
    fsqrt
    fucomi  %st(0), %st
    jp  .L12
    fstp    %st(1)

Do mainstream compilers convert passed-by-reference basic types into pass-by-copy?

4 Answers4

Linked