3

I have a library that is currently dynamically linked against glibc. This library dynamically loaded into an application that is also dynamically linked against glibc. I have no control over the application, only over the shared object.

However, sometimes loading the library causes the application to get SIGKILLd because it has pretty strict real-time requirements and rlimits set accordingly. Looking at this with a profiler tells me that most of the time is actually spent in the linker. So essentially dynamic linking is actually too slow (sometimes). Well that's not a problem I ever thought I'd have :)

I was hoping to solve this issue by producing a statically linked shared object. However, googling this issue and reading multiple other SO threads have warned me not to try to static link glibc. But these seem glibc specific issues.

So my question is, if I were to statically link this shared library against musl and then let a (dynamically linked) glibc application dlopen it, would that be safe? Is there a problem in general with multiple libc's?

henry
  • 140
  • 12

2 Answers2

4

Looking at this with a profiler tells me that most of the time is actually spent in the linker.

Something is very wrong with your profiling methodology.

First, the "linker" does not run when the application runs, only the loader (aka rtld, aka ld-linux) does. I assume you mean't the loader, not the linker.

Second, the loader does have some runtime cost at startup, but since every function you call is only resolved once, proportion of the loader runtime cost for the duration of an application which runs for any appreciable time (longer than about 1 minute) should quickly approach zero.

So essentially dynamic linking is actually too slow (sometimes).

You can ask the loader to resolve all dynamic symbols in your shared library at load time by linking with -Wl,-z,now linker flag.

if I were to statically link this shared library against musl and then let a (dynamically linked) glibc application dlopen it, would that be safe?

Not only this wouldn't be safe, it would most likely not work at all (except for most trivial shared library).

Is there a problem in general with multiple libc's?

Linking multiple libc's into a single process will cause too many problems to count.

Update:

resolving all symbols at load time is exactly the opposite of what I want, as the process gets sigkilled during loading of the shared object, after that it runs fine.

It sounds from this that you are using dlopen while the process is already executing time-critical real-time tasks.

That is not a wise thing to do: dlopen (among other things) calls malloc, reads data from disk, performs mmap calls, etc. etc. All of these require locks, and can wait arbitrarily long.

The usual solution is for the application to perform initialization (which loading your library would be part of) before entering time-critical loop.

Since you are not in control of the application, the only thing you can do is tell the application developers that their current requirements (if these are in fact their requirements) are not satisfiable -- they must provide some way to perform initialization before entering time-critical section, or they will always risk a SIGKILL. Making your library load faster will only make that SIGKILL appear with lower frequency, but it will not remove it completely.

Update 2:

yes, i'm aware that the best I can do is lower the frequency and not "solve" the problem, only try to mitigate it.

You should look into prelink. It can dramatically lower the time required to perform relocations. It's not a guarantee that your chosen prelink address will be available, so you may still get SIGKILLed sometimes, but this could be an effective mitigation.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Yep, I meant the loader, sorry. Maybe I'm misunderstanding something, but resolving all symbols at load time is exactly the opposite of what I want, as the process gets sigkilled during loading of the shared object, after that it runs fine. The problem is that `dlopen()` sometimes takes too much time and makes it so the host process can't keep up with it's real-time requirements anymore ending in the kernel `SIGKILL`ing it. I do not see how that would help me get that time down. I may however look into `RTLD_LAZY` which I've now found because of this. – henry Jul 14 '20 at 06:19
  • thanks, yes, i'm aware that the best I can do is lower the frequency and not "solve" the problem, only try to mitigate it. Which is what i'm trying to do here. – henry Jul 14 '20 at 16:33
  • 1
    @henry Update 2. – Employed Russian Jul 14 '20 at 18:21
  • "Linking multiple libc's into a single process will cause too many problems to count." -- @EmployedRussian this may be possible using an export map to effectively keep the two libc libraries independent. This is something you can do to manage multiple versions of libstdc++ (see [here](https://stackoverflow.com/q/47841812/4447365)). I would be curious if the same approach could be applied to musl's libc. – Ryan Burn Dec 18 '20 at 02:26
3

It is theoretically possible to do something like that, but you will have to write a new version of the musl startup code that copes with the fact that the thread pointer and TCB have already been set up by glibc, and run that code from an ELF constructor in the shared object. Some musl functionality will be unavailable due to TCB layout differences.

I don't think it is likely that this will solve your actual problem. Even if it is time-related, it is possible that this hack makes things worse because it increases the amount of run-time relocations needed.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92