I have a really odd situation with dynamic symbol binding on OS X that I'm hoping to get some clues on how to resolve.
I have an application, written in C, which uses dlopen()
to dynamically load modules at runtime. Some of these modules export global symbols, which may be used by other modules loaded later.
We have one module (which I'll call weird_module.so
) which exports global symbols, one of which is weird_module_function
. If weird_module.so gets linked with a particular library (which I'll call libsomething.dylib
), then weird_module_function
can't be bound to. But if I remove the -lsomething
when linking weird_module.so
, then I can bind to weird_module_function
.
What could possibly be going on with libsomething.dylib
that would cause weird_module.so
to not export symbols? Are there things I can do to debug how symbols get exported (similar to how I can use DYLD_PRINT_BINDINGS
to debug how they get bound)?
$ LDFLAGS="-bundle -mmacosx-version-min=10.6 -Xlinker -undefined -Xlinker dynamic_lookup /usr/lib/bundle1.o"
$ gcc -o weird_module.so ${LDFLAGS} weird_module.o -lsomething
$ nm weird_module.so | grep '_weird_module_function$'
00000000000026d0 T _weird_module_function
$ gcc -o other_module.so ${LDFLAGS} other_module.o -lsomething
$ nm other_module.so | grep '_weird_module_function$'
U _weird_module_function
$ run-app
Loading weird_module.so
Loading other_module.so
dyld: lazy symbol binding failed: Symbol not found: _weird_module_function
Referenced from: other_module.so
Expected in: flat namespace
dyld: Symbol not found: _weird_module_function
Referenced from: other_module.so
Expected in: flat namespace
# Now relink without -lsomething
$ gcc -o weird_module.so ${LDFLAGS} weird_module.o
$ nm weird_module.so | grep '_weird_module_function$'
00000000000026d0 T _weird_module_function
$ run-app
Loading weird_module.so
Loading other_module.so
# No error!
Edit:
I tried putting together a minimal app to duplicate the problem, and in the course of doing so at least figured it out one thing we were doing wrong. There are two other pertinent facts relevant to duplicating the issue.
First is that run-app
preloads the module with RTLD_LAZY | RTLD_LOCAL
to inspect its metadata. The module is then dlclose()
ed and reopened with either RTLD_LAZY | RTLD_GLOBAL
or RTLD_NOW | RTLD_LOCAL
, depending on the metadata. (For both modules in question, it reopens with RTLD_LAZY | RTLD_GLOBAL
).
Secondly, there turns out to be a symbol collision in weird_module.so
and libsomething.dylib
for a const
global.
$ nm weird_module.so | grep '_something_global`
00000000000158f0 S _something_global
$ nm libsomething.dylib | grep '_something_global'
0000000000031130 S _something_global
I'm willing to consider that the duplicate symbol would put me in the realm of undefined behavior, so I'm dropping the question.