3

As a mental exercise, I'm trying to write a program that links directly against the GPU driver of my Macbook Pro rather than using Apple's Metal framework. Some exploration led me to this file (presumably specific to my particular hardware):

/System/Library/Extensions/AMDRadeonX6000MTLDriver.bundle/Contents/MacOS/AMDRadeonX6000MTLDriver

Running file on it confirms this is a Mach-O 64-bit dynamically linked shared library. Running nm on it tells me it's a superset of AMD's ROCr runtime. One symbol in particular that interests me is this one:

$ nm -gD AMDRadeonX6000MTLDriver | grep "hsa_init"
00000000001cca20 T __ZN3HSA8hsa_initEv
$ nm -gCD AMDRadeonX6000MTLDriver | grep "hsa_init"
00000000001cca20 T HSA::hsa_init()

So I wrote this simple program (rocr_test.cpp):

typedef int hsa_status_t;

namespace HSA {
    hsa_status_t hsa_init();
}

int main() {
    HSA::hsa_init();
    return 0;
}

And compiled it like so:

$ clang++ rocr_test.cpp -c
$ clang++ rocr_test.o /System/Library/Extensions/AMDRadeonX6000MTLDriver.bundle/Contents/MacOS/AMDRadeonX6000MTLDriver
Undefined symbols for architecture x86_64:
  "HSA::hsa_init()", referenced from:
      _main in rocr_main-95c854.o
ld: symbol(s) not found for architecture x86_64
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

However, nm on the object file shows the linker should look for a symbol with the same name:

$ nm rocr_test.o          
                 U __ZN3HSA8hsa_initEv
0000000000000000 T _main

Why am I seeing this linker error, when nm shows that a symbol with this exact name clearly exists in the shared library?

Rahul
  • 3,293
  • 2
  • 31
  • 43
jweightman
  • 328
  • 1
  • 12
  • Did you check that the namemangling is the same, so it refers to the same symbol? Or could it be that it's not part of a namespace but a class member? – Devolus May 01 '21 at 08:19
  • That was a good thought — looks like the the symbol's name is `__ZN3HSA8hsa_initEv` in the shared library, and I got the same symbol name for a function in a namespace that I did for a method in a class. – jweightman May 01 '21 at 18:18
  • 1
    Please don't add comments to you questions that contain information about the question. Edit the question and add this information there. – harper May 02 '21 at 08:58
  • Your comment tells about a function you defined in a class, but the question is about a function defined in a namespace. Be careful. I recommend to add a copy of both mangled names ''in the question''. I expect that explicitly writing the name in the question can be a good help for you. – harper May 02 '21 at 09:01
  • Is `rocr_test.o` in Mach-O or ELF format? Does running the linker step with `-v` option provide any hints? – Leon May 05 '21 at 11:50
  • `file rocr_test.o` indicates it is a "Mach-O 64-bit object x86_64," which matches the dylib "Mach-O 64-bit dynamically linked shared library x86_64." – jweightman May 06 '21 at 13:44
  • The verbose link command is this: ` "/Library/Developer/CommandLineTools/usr/bin/ld" -demangle -lto_library /Library/Developer/CommandLineTools/usr/lib/libLTO.dylib -dynamic -arch x86_64 -platform_version macos 11.0.0 11.0 -syslibroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.0.sdk -o a.out -L/usr/local/lib rocr_main.o /System/Library/Extensions/AMDRadeonX6000MTLDriver.bundle/Contents/MacOS/AMDRadeonX6000MTLDriver -lc++ -lSystem /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.osx.a` – jweightman May 06 '21 at 13:44
  • It would be good to know what the double underscore means in `_ _ Z N 3HSA 8hsa_init Ev` – Zsigmond Lőrinczy May 07 '21 at 04:41

2 Answers2

1

Apple's compiler is a bit different, and in order to link with libraries it needs to use a ".tbd" file. This is a textual file containing the symbol list, a UUID and the basic details of a mach-O it is linked against. You can find plenty of examples of those in the SDK (go to the SDK root and find . -type f -name "*.tbd"). The TBD would look something like:

    --- !tapi-tbd-v3
archs:          [ x86_64 ]
uuids:          ['x86_64: 8891E6F5-0B7C-3CC7-88C1-9F5303311EC7' ]
platform:       ios
install-name:  /System/Library/Extensions/AMDRadeonX6000MTLDriver.bundle/Contents/MacOS/AMDRadeonX6000MTLDriver
objc-constraint:        none
exports:
  - archs:      [ x86_64 ]
    symbols:          [  __Z34amdMtl_GFX10_GetFallbackFamilyNameP15GFX10_HwInfoRec, __Z35amdMtl_GFX10_GetFallbackProductNameP15GFX10_HwInfoRec, __Z25amdMtl_GFX10_AllocLsHsMgrP15GFX10_MtlDeviceP14AMDPPMemMgrRec, ...

You'd have to create a TBD for the Bundle, (the above was created using jtool2 --tbd), and direct the compiler to use it (or place it in the SDK directory) and that should (hopefully) work.

Technologeeks
  • 7,674
  • 25
  • 36
  • Interesting! This seems like it might be the issue, although I'm surprised this would affect "standard" Clang (I have both on my system) and GCC as well. I tried installing jtool2 via Homebrew, but `jtool2 --tbd AMDRadeonX6000MTLDriver` doesn't seem to have any effect. If I first run with `--analyze` then subsequent operations print out 13 errors starting with `Malformed line`. Could you describe in more detail how you produced the TBD file? – jweightman May 07 '21 at 18:27
  • Okay, I was able to produce the TBD file using the `tapi` utility from my Xcode toolchain. Now that I have the TBD file, how do I "direct the compiler to use it"? – jweightman May 07 '21 at 21:35
0

If has_init is not part of a class, then you can still call the function by it's mangled name. However, this will only work if it is a free function. If it is part of a class, then you can not really call it without class definition, as you don't know what it does to the class members and you would have to pass the object as the first argument.

#include <iostream>
#include <dlfcn.h>

using namespace std;

typedef int hsa_status_t;
typedef hsa_status_t (*hsa_init_t)();
hsa_init_t hsa_init;

const char *hsa_init_name = "__ZN3HSA8hsa_initEv";
const char *libPath = "/System/Library/Extensions/AMDRadeonX6000MTLDriver.bundle/Contents/MacOS/AMDRadeonX6000MTLDriver";

int main()
{
    void *libraryHandle = dlopen(libPath, RTLD_NOW);
    if (!libraryHandle)
    {
        cout << "Error opening library: " << libPath << " Error: " << dlerror() << endl;
        return 0;
    }
    dlerror(); // clear any existing error

    hsa_init = (hsa_init_t)dlsym(libraryHandle, hsa_init_name);
    if (!hsa_init)
    {
        cout << "Error importing symbol: " << hsa_init_name << " Error: " << dlerror() << endl;
        return 0;
    }

    hsa_init();

    return 0;
}
Devolus
  • 21,661
  • 13
  • 66
  • 113
  • Hmm... That doesn't seem to do the trick either. What's weird is that if I compile to an object file, I see the **exact same** symbol name in the object file and the dynamic library. I'm reasonably confident in the function's signature, because this matches a function defined in [AMD's open source runtime library](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/77589727ecc4bc715ff3f6d76dcd59f4c455b8db/src/core/inc/hsa_internal.h#L52). Is it possible Apple has done something to make it impossible/difficult to link against their fork of the ROCr runtime? – jweightman May 04 '21 at 03:05
  • Did you also try to add the `_` in Front? Maybe I missremembered and in this case you have to put the exact symbol? I have used this approach myself, so it should work. – Devolus May 04 '21 at 05:01
  • It seems you were right to remove one of the two leading underscores. If I produce an object file before linking shows that your code produces an "undefined" symbol with the same name as the symbol exported by the dynamic library (i.e. with two leading underscores). Thus, I suspect the issue isn't related to name mangling or symbol names. Perhaps there's something else going on? – jweightman May 04 '21 at 07:01
  • Could also be that they are using some different compiler, which produces objectfiles with a slightly different format? Since that file was not supposed to be directly linked by usercode, this wouldn't cause any problems in the normal case. – Devolus May 04 '21 at 08:17
  • I tested this with gcc and here it works. Only I did not remove the leading `_` so you might try it with that: `#define hsa_init __ZN3HSA8hsa_initEv` instead. I think remembered this wrong, because when calling it, you need the exact symbol. The leading `_` is only added on the exported symbol, so it should be included and needs to be present when being called. – Devolus May 04 '21 at 08:40
  • I just realized that you are trying to load a shared library and not an objectfile. I updated my code to reflect that. In that case you don't need to link the file during compilation because it is loaded at runtime. – Devolus May 04 '21 at 13:52
  • Sorry, I've tried using `dlsym` as well like you suggest, but it also didn't work for me. Neither did using the symbol directly. All of this I tried with my laptop's default toolchain (Apple Clang++), upstream Clang 11.0, and GCC 10.2.0. I really appreciate all the ideas and help, but I think this is not dealing with the fundamental issue — whatever that may be. I guess I'll have to dive deeper into the debugger to understand how this dylib is used and perhaps try something more hacky to circumvent the linker. – jweightman May 07 '21 at 14:15
  • If it is a low level driver, it might not be a real shared library. You might have to learn how such drivers are created on your system. Don't know if there is some DDK availaible, but that might be the thing to look for. – Devolus May 07 '21 at 15:15