0

I am creating a python package based on this repo. The package has few cpp files which are compiled when I build the package using setup.py and running pip install . This generates _C.cpython-36m-x86_64-linux-gnu.so file in my package installation directory. To import this dll (.so) file all I have to do is

from . import _C (something like this)

Now the imported _C object points to _C.cpython-36m-x86_64-linux-gnu.so. I don't understand how _C object gets linked to the specific .so file. Is that information written in any of the metadata files while the package is being built?

Gaurav Srivastava
  • 505
  • 1
  • 7
  • 17

1 Answers1

0

No. The mechanism for handling the C++ library loading is done by pybind. In the documentation (https://pybind11.readthedocs.io/en/stable/basics.html), you will see that in order to import a C++ library built with the pybind API, the correct syntax in your .py file is to import the prefix of the library. Thus, when you write import _C in your python code on your linux system it will look for _C.<whatever>.so and load the symbols from that file.

All of the C++ files used to build import-able python modules in the repo you refer to ultimately include torch/extension.h (via vision.h --- https://github.com/microsoft/scene_graph_benchmark/blob/main/maskrcnn_benchmark/csrc/cuda/vision.h#L3) and if you explore the source for pytorch, extension.h includes python.h which includes pybind.h (https://pytorch.org/cppdocs/api/program_listing_file_torch_csrc_api_include_torch_python.h.html).

dmedine
  • 1,430
  • 8
  • 25
  • I looked into the pybind document but still could not find how pybind selects the name of the resulting .so file or dll and binds it with _C object in python. If you can add that part to your answer linking it the relevant section in the document. – Gaurav Srivastava Feb 20 '22 at 04:06
  • Well, I don't know precisely how pybind does it either, but I'm guessing they use regex. If you really need the implementation details, you most likely need to read the source code. – dmedine Feb 20 '22 at 22:34
  • your answer is almost complete in its current for but I was looking for a brief mention in pybind's documentation where it mentions, probably in words, that _C is linked to _C..so. This would give confidence to future readers of your answer and also give point of reference in the documentation if anything changes with pybind. Just, wondering if you read it in some official documentation or elsewhere that pybind does this linking. – Gaurav Srivastava Feb 22 '22 at 01:27
  • It's all in the first link. If you scroll down, the instruction for compiling the C++ code is: `$ c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) example.cpp -o example$(python3-config --extension-suffix)`. Note all those wildcards in the output file name. Then, in the next paragraph it shows that calling `import example` gets it into your python code. – dmedine Feb 22 '22 at 01:32
  • That clarifies some of my doubts. My next questions are how does the torch.utils.cpp_extension.CppExtension function here knows where to place the generated _C.xxx.so file and what to name it. The name in extension function is given is maskrcnn_benchmark._C. But, I think, that is an appropriate question for torch forum. – Gaurav Srivastava Feb 22 '22 at 04:37
  • They use some python/ninja magic to generate a compile script for the C++ code. It follows a standard pattern for doing this that I know very little about. It's all handled in `setup.py`. You can see that the directories pointing to the C++ code are handed to the method that does this (https://github.com/microsoft/scene_graph_benchmark/blob/main/setup.py#L19, and https://github.com/microsoft/scene_graph_benchmark/blob/main/setup.py#L46-L56). Somehow under the hood it builds a file called _C..so and puts it where the python code will find it via pybind. – dmedine Feb 22 '22 at 04:58
  • 1
    Maybe read this: https://python.plainenglish.io/building-hybrid-python-c-packages-8985fa1c5b1d – dmedine Feb 22 '22 at 05:03