4

This was tested on Debian squeeze with g++ 4.4 and g++ 4.7. Consider two C++ source files.

################
foo.cc
#################
#include <string>
using std::string;

int foo(void)
{
  return 0;
}

#################
bar.cc
#################
#include <string>
using std::string;

//int foo(void);
string foo(void);

int main(void)
{
  foo();
  return 0;
}
##################

If I compile and run this, predictably there are problems. I'm using scons.

################################
SConstruct
################################
#!/usr/bin/python


env = Environment(
    CXX="g++-4.7",
    CXXFLAGS="-Wall -Werror",
    #CXX="g++",
    #CXXFLAGS="-Wall -Werror",
    )

env.Program(target='debug', source=["foo.cc", "bar.cc"])
#################################

Compiling and running...

$ scons

g++-4.7 -o bar.o -c -Wall -Werror bar.cc
g++-4.7 -o foo.o -c -Wall -Werror foo.cc
g++-4.7 -o debug foo.o bar.o

$ ./debug 

*** glibc detected *** ./debug: free(): invalid pointer: 0xbff53b8c ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6(+0x6b381)[0xb7684381]
/lib/i686/cmov/libc.so.6(+0x6cbd8)[0xb7685bd8]
/lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb7688cbd]
/usr/lib/libstdc++.so.6(_ZdlPv+0x1f)[0xb7856c5f]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb762fca6]
./debug[0x8048461]
======= Memory map: ========
08048000-08049000 r-xp 00000000 fd:10 7602195    /home/faheem/corrmodel/linker/debug
08049000-0804a000 rw-p 00000000 fd:10 7602195    /home/faheem/corrmodel/linker/debug
09ae0000-09b01000 rw-p 00000000 00:00 0          [heap]
b7617000-b7619000 rw-p 00000000 00:00 0 
b7619000-b7759000 r-xp 00000000 fd:00 1180005    /lib/i686/cmov/libc-2.11.3.so
b7759000-b775a000 ---p 00140000 fd:00 1180005    /lib/i686/cmov/libc-2.11.3.so
b775a000-b775c000 r--p 00140000 fd:00 1180005    /lib/i686/cmov/libc-2.11.3.so
b775c000-b775d000 rw-p 00142000 fd:00 1180005    /lib/i686/cmov/libc-2.11.3.so
b775d000-b7760000 rw-p 00000000 00:00 0 
b7760000-b777c000 r-xp 00000000 fd:00 4653173    /lib/libgcc_s.so.1
b777c000-b777d000 rw-p 0001c000 fd:00 4653173    /lib/libgcc_s.so.1
b777d000-b777e000 rw-p 00000000 00:00 0 
b777e000-b77a2000 r-xp 00000000 fd:00 1179967    /lib/i686/cmov/libm-2.11.3.so
b77a2000-b77a3000 r--p 00023000 fd:00 1179967    /lib/i686/cmov/libm-2.11.3.so
b77a3000-b77a4000 rw-p 00024000 fd:00 1179967    /lib/i686/cmov/libm-2.11.3.so
b77a4000-b7889000 r-xp 00000000 fd:00 2484736    /usr/lib/libstdc++.so.6.0.17
b7889000-b788d000 r--p 000e4000 fd:00 2484736    /usr/lib/libstdc++.so.6.0.17
b788d000-b788e000 rw-p 000e8000 fd:00 2484736    /usr/lib/libstdc++.so.6.0.17
b788e000-b7895000 rw-p 00000000 00:00 0 
b78ba000-b78bc000 rw-p 00000000 00:00 0 
b78bc000-b78bd000 r-xp 00000000 00:00 0          [vdso]
b78bd000-b78d8000 r-xp 00000000 fd:00 639026     /lib/ld-2.11.3.so
b78d8000-b78d9000 r--p 0001b000 fd:00 639026     /lib/ld-2.11.3.so
b78d9000-b78da000 rw-p 0001c000 fd:00 639026     /lib/ld-2.11.3.so
bff41000-bff56000 rw-p 00000000 00:00 0          [stack]
Aborted

Eww. This could have been avoided if the linker had warned that foo was being declared in two different ways. Even with -Wall it doesn't. So, is there a reason why it doesn't, and is there some flag that I can turn on to make it warn? Thanks in advance.

EDIT: Thanks for all the answers. The linker does issue a warning when there are conflicting function definitions, as opposed to a conflicting function definition and declaration as in my example above. I don't understand the reason for this different behavior.

Faheem Mitha
  • 6,096
  • 7
  • 48
  • 83
  • The reason for the difference is that the linker does not see the declaration; it only sees a reference to the function as it is called in main(), and the full name appearing in that reference (generated by the compiler) is informed by the declaration, but not by its return type. – greggo Dec 14 '12 at 18:17

4 Answers4

4

The C++ linker only identifies functions as far as it needs to for unique identification.

This is from the following in-depth article on the C++ linker.

...the names of the symbols are decorated with additional strings. This is called name mangling.

The decoration before the identifier name is needed because C++ supports namespaces. For example the same function name can occur multiple times in different namespaces while denoting a different entity each time. To enable the linker to differentiate between those entities the name of each identifier is prepended with tokens representing its enclosing namespaces.

The decoration after the identifier name is needed because C++ allows function overloading. Again the same function name can denote different identifiers, which differ only in their parameter list. To enable the linker to differentiate between those, tokens representing the parameter list are appended to the name of the identifier. The return type of a function is disregarded, because two overloaded functions must not differ only in their return type.

So the point is that the name mangling applied to functions disregards return type as overloaded functions cannot differ by return type. As such the linker is unable to spot the problem.

Tim Gee
  • 1,062
  • 7
  • 9
  • That's about C++ linkers in general. Is there any reason why G++ couldn't be more strict? – Ken Bloom Jan 31 '12 at 00:08
  • Well I don't believe there's any standardization of name mangling, but if they were to change it *a lot* of things might break and as pointed out elsewhere, there are easier ways to avoid this problem. – Tim Gee Jan 31 '12 at 00:19
  • @TimGee: Thanks for the explanation and the link. I see the first example in the article (`struct Broken`) is essentially my example. I observe that the linker *does* notice if I write two different inconsistent *definitions* of `foo`. What distinguishes these two cases? – Faheem Mitha Jan 31 '12 at 00:24
  • C and C++ allow you overload two different functions with the same name using different parameters, as such the linker has to uniquely identify which one of different functions you might be calling. if it can't find one with the same parameters to match, it will throw an error. – Tim Gee Jan 31 '12 at 00:31
  • @TimGee: g++ has changed their ABI a few times in the past, and it's been a big pain for Linux distributions that suddenly have to recompile everything, so I wouldn't recommend doing it again. But is their any technical reason (inherent in how C++ works) why G++ couldn't have introduced (at some time in the past, e.g. when they were breaking the ABI anyway) a name mangling scheme that differentiated these definitions? (Or some documentation proof that they haven't done so already?) – Ken Bloom Jan 31 '12 at 03:06
2

This is the best example of the reason to have a local project header file (perhaps foobar.h) which includes all such functions. That way the compiler can see such problems.

Linkers were never intended to identify such an issue. Gotta leave something for Real Engineers™ to do. :-)

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • Good point, re common headers.Still, I think the question is, could linkers identify such issues, or is it simply that they cannot be bothered? :-) – Faheem Mitha Jan 31 '12 at 00:17
  • @FaheemMitha: I have not seen a compiler which encodes the return type, but if it were added, the linker could easily report such errors. Borland C++ encodes all kinds of symbol attributes in the name mangling: calling convention (C, standard, pascal), parameter passing, memory model attributes, segment register affinity, and some really obscure aspects. – wallyk Jan 31 '12 at 03:19
  • I see. I wonder why neither gcc nor any other compiler does this, if it is possible. Apparently the standard doesn't say anything about hos this name mangling happens. – Faheem Mitha Jan 31 '12 at 03:34
2

The linker just acts on the names that the compiler says are defined in modules are or are referenced (needed) by modules. GCC apparently uses the "Itanium C++ ABI" for mangling function names (starting with GCC 3). For most functions, the return type isn't incorporated into the mangled name, so that's why the linker doesn't take it into account:

Itanium C++ ABI

Function types are composed from their parameter types and possibly the result type. Except at the outer level type of an , or in the of an otherwise delimited external name in a or function encoding, these types are delimited by an "F..E" pair. For purposes of substitution (see Compression below), delimited and undelimited function types are considered the same.

Whether the mangling of a function type includes the return type depends on the context and the nature of the function. The rules for deciding whether the return type is included are:

  • Template functions (names or types) have return types encoded, with the exceptions listed below.
  • Function types not appearing as part of a function name mangling, e.g. parameters, pointer types, etc., have return type encoded, with the exceptions listed below.
  • Non-template function names do not have return types encoded.

The exceptions mentioned in (1) and (2) above, for which the return type is never included, are

  • Constructors.
  • Destructors.
  • Conversion operator functions, e.g. operator int

In general in C++ the return type of a function isn't considered when the compiler performs name lookup (for example for overload resolution). This might be part of the reason why the return type isn't usually included in the name mangling. I don't know if there's a stronger reason for not incorporating the return type into the mangled name.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Thanks for the explanation. Can you comment on why two conflicting definitions *do* give a linker error? See the edit to my question. – Faheem Mitha Jan 31 '12 at 01:27
  • If I understand the question about two conflicting definitions correctly, it's because you'll have 2 separate objects in the modules being linked that have the same name. The linker doesn't know which one to use. If one (or more) of the conflicting definitions is in a library, the linker might not care about that extra definition, depending on exactly how the linker options and inputs are specified. – Michael Burr Jan 31 '12 at 01:33
  • I see. So, if there just a conflicting definition + declaration, it does not generate two different objects? – Faheem Mitha Jan 31 '12 at 01:36
0
$ cat foo.cpp

#include <string>
using std::string;

int foo(void)
{
    return 0;
}

$ cat bar.cpp

#include <string>
using std::string;

//int foo(void);
string foo(void);

int main(void)
{
    foo();
    return 0;
}

$ g++ -c -o bar.o bar.cpp
$ g++ -c -o foo.o foo.cpp
$ g++ foo.o bar.o
$ ./a.out 
$ echo $?
0
$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1

Could not reproduce.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • 2
    You mean you can't reproduce the crash? You can reproduce the fact that no error is raised, which is my main point. Crashes may be dependent on local conditions. Try changing `string` to something 'bigger'? – Faheem Mitha Jan 31 '12 at 00:19