7

I want to intercept all file system access that occurs inside of dlopen(). At first, it would seem like LD_PRELOAD or -Wl,-wrap, would be viable solutions, but I have had trouble making them work due to some technical reasons:

  • ld.so has already mapped its own symbols by the time LD_PRELOAD is processed. It's not critical for me to intercept the initial loading, but the _dl_* worker functions are resolved at this time, so future calls go through them. I think LD_PRELOAD is too late.

  • Somehow malloc circumvents the issue above because the malloc() inside of ld.so does not have a functional free(), it just calls memset().

  • The file system worker functions, e.g. __libc_read(), contained in ld.so are static so I can't intercept them with -Wl,-wrap,__libc_read.

This might all mean that I need to build my own ld.so directly from source instead of linking it into a wrapper. The challenge there is that both libc and rtld-libc are built from the same source. I know that the macro IS_IN_rtld is defined when building rtld-libc, but how can I guarantee that there is only one copy of static data structures while still exporting a public interface function? (This is a glibc build system question, but I haven't found documentation of these details.)

Are there any better ways to get inside dlopen()?

Note: I can't use a Linux-specific solution like FUSE because this is for minimal "compute-node" kernels that do not support such things.

Jed
  • 1,651
  • 17
  • 26
  • This is not an answer to your question, so I'm not posting it as one, but in general you can't do this reliably: it's possible to get at the file system by calling the syscall directly without going through the dynamic library interface. If you don't have absolute control over how the library you're trying to load has been compiled you may be out of luck. Programs like fakeroot which use this technique work fine most of the time and fail horribly in some situations. – David Given Oct 08 '11 at 21:00
  • That said, you *can* make this work by running your dynamic library code in its own process and using `ptrace` to intercept the system calls themselves. I've done this with great success and it avoids all the shared library nonsense completely. But it does require you to completely redesign your logic to have a master process which does the ptrace stuff and a slave process which does the dynamic library stuff. – David Given Oct 08 '11 at 21:02
  • Well, I need `dlopen`/`dlsym` to function properly, but to access the filesystem differently. In particular, in HPC environments such as Blue Gene, all operations involving a kernel file descriptor are shipped from the compute nodes IO nodes. This causes a serious contention issue at high node concurrency. For example, loading a Python application that references a number of compiled shared libraries takes about 4 hours on 65k cores. Needless to say, people are not thrilled about burning a quarter million core hours to load their program. – Jed Oct 08 '11 at 21:57
  • To fix this, I implemented the IO interface (`open`, `read`, `mmap`, etc) using MPI collectives. This is fine for loading Python bytecode, but shared libraries have to go through `dlopen` and I'm having trouble getting my implementation called inside of `dlopen`. – Jed Oct 08 '11 at 21:59
  • I suspect you're going to have to write your own dlopen() implementation. Which is a horror. (We did this at the place I work for my day job.) I'd be inclined to try the ptrace trick; it's not much code and it will allow you to run the stock version of the code, including stock dlopen(), but your monitor server watches the process and overrides the file system calls to do its own thing. It does make system calls slower, though, but if you're CPU-bound that might not be a problem. See http://quequero.org/Intercepting_with_ptrace%28%29. – David Given Oct 08 '11 at 23:37
  • Hmm, thanks. Unfortunately, `ptrace` is not available on Blue Gene, so it's not an option. I will go back to building my own ld.so with `dlopen` wired up to call my implementation. – Jed Oct 08 '11 at 23:59

1 Answers1

6

it would seem like LD_PRELOAD or -Wl,-wrap, would be viable solutions

The --wrap solution could not possibly be viable: it works only at (static) link time, and your ld.so and libc.so.6 and libdl.so.2 have all already been linked, so now it is too late to use --wrap.

The LD_PRELOAD could have worked, except ... ld.so considers the fact that dlopen() calls open() an internal implementation detail. As such, it just calls the internal __open function, bypassing PLT, and your ability to interpose open with it.

Somehow malloc circumvents the issue

That's because libc supports users who implement their own malloc (e.g. for debugging purposes). So the call to e.g. calloc from dlopen does go through PLT, and is interposable via LD_PRELOAD.

This might all mean that I need to build my own ld.so directly from source instead of linking it into a wrapper.

What will the rebuilt ld.so do? I think you want it to call __libc_open (in libc.so.6), but that can't possibly work for obvious reason: it is ld.so that opens libc.so.6 in the first place (at process startup).

You could rebuild ld.so with the call to __open replaced with a call to open. That will cause ld.so to go through PLT, and expose it to LD_PRELOAD interposition.

If you go that route, I suggest that you don't overwrite the system ld.so with your new copy (the chance of making a mistake and rendering the system unbootable is just too great). Instead, install it to e.g. /usr/local/my-ld.so, and then link your binaries with -Wl,--dynamic-linker=/usr/local/my-ld.so.

Another alternative: runtime patching. This is a bit of a hack, but you can (once you gain control in main) simply scan the .text of ld.so, and look for CALL __open instructions. If ld.so is not stripped, then you can find both the internal __open, and the functions you want to patch (e.g. open_verify in dl-load.c). Once you find the interesting CALL, mprotect the page that contains it to be writable, and patch in the address of your own interposer (which can in turn call __libc_open if it needs to), then mprotect it back. Any future dlopen() will now go through your interposer.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • The first idea is useful, but switching to `PLT` calls in `dlopen()` resulted in segfaults, so we're going to look into the second option... – Aron Ahmadia Oct 18 '11 at 12:58