7

From C we know what legal variable names are. The general regex for the legal names looks similar to [\w_](\w\d_)*.

Using dlsym we can load arbitrary strings, and C++ mangles names that include @ in the ABI..

My question is: can arbitrary strings be used? The documentation on dlsym does not seem to mention anything.

Another question that came up appears to imply that it is fully possible to have arbitrary null-terminated symbols. This inquires me to ask the following question:

Why doesn't g++ emit raw function signatures, with name and parameter list, including namespace and class membership?

Here's what I mean:

namespace test {
class A
{
    int myFunction(const int a);
};
}

namespace test {
int A::myFunction(const int a){return a * 2;}
}

Does not get compiled to

int ::test::A::myFunction(const int a)\0

Instead, it gets compiled to - on my 64 bit machine, using g++ 4.9.2 -

0000000000000000 T _ZN4test1A10myFunctionEi

This output is read by nm. The code was compiled using g++ -c test.cpp -o out

Community
  • 1
  • 1
Ultimate Hawk
  • 580
  • 4
  • 16
  • What exactly do you mean as "raw function signatures"? Do you expect the symbols to have names like `my_ns::my_class::create(int const *, bool)` ? – lisyarus Aug 10 '15 at 14:45
  • 1
    @lisyarus That's what I understood. The example is pretty clear. – Quentin Aug 10 '15 at 14:46
  • @Quentin yes, I see. Thank you. – lisyarus Aug 10 '15 at 14:47
  • @lisyarus: yes, maybe even containing the name of the parameters, even though these are not important. – Ultimate Hawk Aug 10 '15 at 14:47
  • have you seen what the symbols look like when you overload a function? – Mgetz Aug 10 '15 at 15:03
  • @Mgetz: Yes. Do you think having the full signature as accessed via the language can not avoid that overload resolution? – Ultimate Hawk Aug 10 '15 at 15:10
  • @BourgondAries I think the highest upvoted answer answers that best. – Mgetz Aug 10 '15 at 15:11
  • I think that this question should be asked to GCC developpers. The why for name mangling does not seem to be the question. The why for the implementation can only be answered by developpers. – Serge Ballesta Aug 10 '15 at 15:55
  • Your example is wrong. That is not a valid way to refer to that function, even within C++. – Lightness Races in Orbit Aug 10 '15 at 18:17
  • There's a lot of history behind why names are mangled. Even in C there was some "mangling". One of the biggest reasons was symbol length limitation. Linkers used to only pay attention to a limited number of characters, so you had to generate symbols for overloaded functions with long names that the linker would recognize as distinct. – Rob K Aug 10 '15 at 18:35
  • @RobK: No, you didn't, because it was unspecified if different long names were treated as referring to the same function or object. That is to say. `foobar1` and `foobar2` might name the same function (!) IOW, the linker limitations were promoted to a language feature in C. In C++, this 6 character limit never existed, but I remember VC++ having issues around 256 characters. – MSalters Aug 11 '15 at 07:54
  • And I don't know how could you manage functions and objets declared within anonymous namespaces. – ABu Aug 11 '15 at 10:52

5 Answers5

5

I'm sure this decision was pragmatically made to avoid having to make any changes to pre-existing C linkers (quite possibly even originated from cfront). By emitting symbols with the same set of characters the C linker is used to you don't have to possibly make any number of updates and can use the linker off the shelf.

Additionally C and C++ are widely portable languages and they wouldn't want to risk breaking a more obscure binary format (perhaps on an embedded system) by including unexpected symbols.

Finally since you can always demangle (with something like gc++filt for example) it probably didn't seem worth using a full text representation.

P.S. You would absolutely not want to include the parameter name in the function name: People will not be happy if renaming a parameter breaks ABI. It's hard enough to keep ABI compatibility already.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 1
    There;s no such thing as "parameter name" anyway. The caller may have a declaration with different parameter names, and yet it names the same function. – MSalters Aug 11 '15 at 07:56
1
  1. Because of limitations on the exported names imposed by a linker (and that includes the OS's dynamic linker) - character set, length. The very phenomenon of mangling arose because of this.
    • Corollary: in media where these limitations don't exist (various VMs that use their own linkers: e.g. .NET, Java), mangling doesn't exist, either.
  2. Each compiler that produces exports that are incompatible with others must use a different scheme. Because linker (static or dynamic) doesn't care about ABIs, all it cares about is identifiers.
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • 1. I think this is too abstract: what 'limitations'? 2. Produces exports as in 'compiles an object file' or 'mangles the names'? This point does not make sense to me. – Ultimate Hawk Aug 10 '15 at 14:57
  • 1. I added: "character set, length". 2. Each compilation unit has a set of exported symbols & imported symbols. A linker takes these and matches them together. I.e. all the information about the exported entity's ABI must be encoded into the exported name. – ivan_pozdeev Aug 10 '15 at 15:09
1

GCC is compliant with the Itanium C++ ABI. If your question is “Why does the Itanium C++ ABI require names to be mangled that way?” then the answer is probably

  1. because its designers thought this would b a good idea and
  2. shorter symbols make for smaller object files and faster dynamic linking.

For the second point, there is a pretty good explanation in Ulrich Drepper's article How To Write Shared Libraries.

5gon12eder
  • 24,280
  • 5
  • 45
  • 92
1

You basically answered your own question:

The general regex for the legal names looks similar to [\w_](\w\d_)*.

From the beginning, C++ used preexisting (C) linker / loader technology. There is nothing "C++" about either ld, ld-linux.so etc.

So linking is limited to what was legal in C already. That does not include colons, parenthesis, ampersands, asteriskes, and whatever else you would need to encode C++ identifiers in plain text.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • No I didn't, considering I later refer to another SO question that says any null-terminated string can be used. – Ultimate Hawk Aug 10 '15 at 15:22
  • 1
    @BourgondAries: The answer you linked to talks about what the *ELF specification* has to say about it. I am pretty sure that `ld` still does work on combinations of letters, numbers, and underscores... – DevSolar Aug 10 '15 at 15:30
1

(In this answer I ignore that you made several typos in your example of ::test::A::void myFunction(const int a)).

This format is:

  • not programmer-specific; consider that all these are the same, so why confuse people:
    • int ::test::A::myFunction(const int)
    • int ::test::A::myFunction(int const)
    • int test::A::myFunction(int const)
    • int test :: A :: myFunction (int const)
    • and so on…
  • unambiguous
  • terse; no parameter names or other unnecessary decorations
  • easier to parse (notice that the length of each component is present as a number)

Meanwhile, I see no benefit at all in choosing a human-readable looks-like-C++ format for a C++ ABI. This stuff is supposed to be optimised for machines. Why would you make it less optimal for machines, in order to make it more optimal for humans? And probably failing at the latter whilst doing so.

You say that your compiler does not emit "raw symbols". I posit that it does precisely that.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • In the question I am trying to define "raw" symbols in the question, hence the quotation marks. Also edited the signature to be more like a definition. Also, I'm not sure about the `easier to parse` part. Whenever we link we need a declaration or go via `ld.so`. The former mangles the name according to the compiler and the latter just gets us the symbol. Could you elaborate on what you mean and perhaps why I'm thinking wrongly? – Ultimate Hawk Aug 10 '15 at 18:39