3

I was conducting an experiment and defined the same class both in a program and in a shared library that I dlopen from the program and made sure the program has no entry for the type info object in its dynsym table. Then I throw an object of that class from within the shared library and try to catch it using the same class type.

I expected that the implementation on linux and gcc would not catch the exception because the type info objects of both classes in the program and shared library differs, and therefore a match would only be possible if the runtime would do a string comparison of the mangled class names.

Still it matches, and I can even do dynamic downcasts to classes defined in the shared library. Can anyone please explain how the implementation works in this case, please?

Edit

Based on what the Itanium ABI states, the observed behavior would seem to be nonconforming. What am I missing here?

Therefore, except for direct or indirect pointers to incomplete types, the equality and inequality operators can be written as address comparisons when operating on those type_info objects: two type_info structures describe the same type if and only if they are the same structure (at the same address).

As the two typeinfo had different addresses, the structures described therefore represent different types. Therefore the cast should have failed and the exception should not have been caught.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Are the static members inside classes in the dlopen-ed modules merged with the static members inside classes with the same source level name? – curiousguy Jul 11 '18 at 01:58

2 Answers2

1

The Itanium ABI explicitly states to compare type_info objects for classes by mangled name, which is indeed the hypothesized implementation in your question.

https://refspecs.linuxbase.org/cxxabi-1.86.html#rtti

I would imagine that the rationale here is to support exactly the behaviour that you have observed.

typeinfo descriptors are also defined to have "vague linkage", which is lovely. However, the definition used requires that they are placed in COMDAT groups. COMDAT groups are required to be deduplicated by the linker, at least as part of static linking. I am unable currently to determine if they are also required to be deduplicated for dynamic linking, but it seems logical.

So in summary, the answer to your question is, "It's handled because the ABI authors foresaw this situation and ensured it was handled".

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • 1
    thanks. hmm, from what I read, it seems to indicate that the address has to be the same (note the iff): "Therefore, except for direct or indirect pointers to incomplete types, the equality and inequality operators can be written as address comparisons when operating on those type_info objects: two type_info structures describe the same type if and only if they are the same structure (at the same address)." – Johannes Schaub - litb Nov 25 '15 at 22:14
  • http://stackoverflow.com/questions/5044993/typeinfo-shared-libraries-and-dlopen-without-rtld-global?rq=1 Seems to disagree with your answer. – Johannes Schaub - litb Nov 26 '15 at 20:44
  • 1
    Without the RTLD_GLOBAL flag, the modules are effectively two distinct programs, and so the restriction does not apply, as it only considers one program. You can see other violations of the specification here, like having two definitions of the same function. – Puppy Nov 28 '15 at 20:22
-1

defined the same class both in a program and in a shared library that I dlopen from the program - congratulations on violating One Definition Rule. In the land of undefined behavior anything can happen, you should know.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Why is that an ODR violation? I thought that is why we have header files, to define the class in whatever obj file we need in the programm. How else do you want to define an interface? – Johannes Schaub - litb Nov 25 '15 at 19:31
  • You have one header but multiple definition: one in your application and one in your library. There should be _one_ definition, either in your program or in your library. Not both. One soltuion would be to place common classes in one shared library and use dynamic linking for them. – Guillaume Racicot Nov 25 '15 at 19:41
  • @gui all member functions were inline. So I don't see a violation here. But since this is shared library madness, and the C++ Standard does not know about shared libraries, I think we could not even say that it would be an ODR violation even if the functions were noninline. Normally linux elf shared objects support multiple definitions being present. – Johannes Schaub - litb Nov 25 '15 at 19:56
  • @ᐅJohannesSchaub-litbᐊ, it is an ODR violation alright. Your class IS defined in two places. And inline or non-inline has no bearing hear. Such violation (as produced by `dlopen`ing something) is a known cause of the very subtle and hard to trace bugs. Yours trully only recently dealed with production coredump of a similar nature - global static object defined in both application and one of the shared libraries it `dlopen`ed. – SergeyA Nov 25 '15 at 20:08
  • Those objects are different to implementation-defined hidden objects. The Standard does not permit the user to issue typeinfo structure definitions, or specify their linkage; therefore it's up to the implementation to handle this case or not. – Puppy Nov 25 '15 at 22:07
  • The ODR only guaranties that the all definitions of an entity are somewhat equivalent. How does it apply to `dlopen`? – curiousguy Jul 11 '18 at 02:35
  • @curiousguy ODR guarantees nothing, it demands. Btw, would you have any idea on the downvote? – SergeyA Jul 11 '18 at 13:56
  • @curiousguy https://en.cppreference.com/w/cpp/language/definition (scroll to One Definition Rule) – SergeyA Jul 12 '18 at 15:00
  • I have search, no mention of `dlopen` in that definition. – curiousguy Jul 28 '18 at 14:07
  • @curiousguy, yes, because dlopen is not part of C++ standard. But it has to follow the same rules as static libraries. – SergeyA Jul 31 '18 at 18:45
  • @SergeyA Source? – curiousguy Jul 31 '18 at 19:18
  • @curiousguy the source was provided. There will be no other single source. You just need to know what dlopen is, to apply standard wording to it's effect. – SergeyA Jul 31 '18 at 19:29
  • @SergeyA So there is no relevant source – curiousguy Jul 31 '18 at 19:33
  • @curiousguy your craving for "source" is weird. If I go there to the cppreference page and add a wording about this being applicable to dlopen calls, would it be source enough for you? At some point, you need to start understanding things, rather than just demanding a single-statement "source". In this case, you are looking at two interconnecting systems - posix for dlopen, C++ for ODR. Neither will talk about the other, so you have to make connection yourself. – SergeyA Jul 31 '18 at 19:56
  • Unix like system allow replacing std functions like `malloc`. Is that ODR approved? – curiousguy Jul 31 '18 at 19:57
  • @curiousguy it is ODR-agnostic. ODR doesn't care about that fact, since you do not end up with more than one definition of `malloc` in the program. – SergeyA Jul 31 '18 at 20:00
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/177131/discussion-between-curiousguy-and-sergeya). – curiousguy Jul 31 '18 at 20:04