I have somewhat successfully dockerized a software repository (KPConv) that I plan to work with and extend with the following Dockerfile
FROM tensorflow/tensorflow:1.12.0-devel-gpu-py3
# Install other required python stuff
RUN apt-get update && apt install -y --fix-missing --no-install-recommends\
python3-setuptools python3-pip python3-tk
RUN pip install --upgrade pip
RUN pip3 install numpy scikit-learn psutil matplotlib pyqt5 laspy
# Compile the custom operations and CPP wrappers
# For some reason this must be done within container, cannot access libcuda.so during docker build
# Ref: https://stackoverflow.com/questions/66575232
#COPY . /kpconv
#WORKDIR /kpconv/tf_custom_ops
#RUN sh compile_op.sh
#WORKDIR /kpconv/cpp_wrappers
#RUN sh compile_wrappers.sh
# Set the working directory to kpconv
WORKDIR /kpconv
# Set root user password so we can su/sudo later if need be
RUN echo "root:pass" | chpasswd
# Create a user and group akin to the host within the container
ARG USER_ID
ARG GROUP_ID
RUN addgroup --gid $GROUP_ID user
RUN adduser --disabled-password --gecos '' --uid $USER_ID --gid $GROUP_ID user
USER user
#Build
#sudo docker build -t kpconv-test \
# --build-arg USER_ID=$(id -u) \
# --build-arg GROUP_ID=$(id -g) \
# .
At the end of this Dockerfile I followed a post found here which describes a way to correctly set the permissions of files generated by/within a container so that the host machine/user can access them without having to alter the file permissions.
Also, this software repository makes use of custom tensorflow operations in C++ (KPConv/tf_custom_ops) along with Python wrappers for custom C++ code (KPConv/cpp_wrappers). The author of KPConv, Thomas Hugues, provides a bash script which compiles each to generate various .so
files.
If I COPY
the repository into the image during the build process (COPY . /kpconv
), startup the container, call both of the compile bash scripts, and run the code then Python correctly loads the C++ wrapper (the generated .so grid_subsampling.cpython-35m-x86_64-linux-gnu.so
) and begins running the software as expected/intended.
$ sudo docker run -it \
> -v /<myhostpath>/data_sets:/data \
> -v /<myhostpath>/_output:/output \
> --runtime=nvidia kpconv-test /bin/bash
user@eec8553dcb5d:/kpconv$ cd tf_custom_ops
user@eec8553dcb5d:/kpconv/tf_custom_ops$ sh compile_op.sh
user@eec8553dcb5d:/kpconv/tf_custom_ops$ cd ..
user@eec8553dcb5d:/kpconv$ cd cpp_wrappers/
user@eec8553dcb5d:/kpconv/cpp_wrappers$ sh compile_wrappers.sh
running build_ext
building 'grid_subsampling' extension
<Redacted for brevity>
user@eec8553dcb5d:/kpconv/cpp_wrappers$ cd ..
user@eec8553dcb5d:/kpconv$ python training_ModelNet40.py
Dataset Preparation
*******************
Loading training points
1620.2 MB loaded in 0.6s
Loading test points
411.6 MB loaded in 0.2s
<Redacted for brevity>
This works well and allows me run the KPConv software.
Also to note for later the .so
file has the hash
user@eec8553dcb5d:/kpconv/cpp_wrappers/cpp_subsampling$ sha1sum grid_subsampling.cpython-35m-x86_64-linux-gnu.so
a17eef453f6d2370a15bc2a0e6714c978390c5c3 grid_subsampling.cpython-35m-x86_64-linux-gnu.so
It also has the permissions
user@eec8553dcb5d:/kpconv/cpp_wrappers/cpp_subsampling$ ls -al grid_subsampling.cpython-35m-x86_64-linux-gnu.so
-rwxr-xr-x 1 user user 561056 Mar 14 02:16 grid_subsampling.cpython-35m-x86_64-linux-gnu.so
Though it produces a difficult workflow for quickly editing and the software for my purposes and quickly running it within the container. Every change to the code requires a new build of the image. Thus, I would much rather mount/volume the KPConv code from the host into the container at runtime and then the edits are "live" within the container as it is running.
Doing this and using the Dockerfile at the top of the post (no COPY . /kpconv
) to compile an image, perform the same compilation steps, and run the code
$ sudo docker run -it \
> -v /<myhostpath>/data_sets:/data \
> -v /<myhostpath>/KPConv_Tensorflow:/kpconv \
> -v /<myhostpath>/_output:/output \
> --runtime=nvidia kpconv-test /bin/bash
user@a82e2c1af21a:/kpconv$ cd tf_custom_ops/
user@a82e2c1af21a:/kpconv/tf_custom_ops$ sh compile_op.sh
user@a82e2c1af21a:/kpconv/tf_custom_ops$ cd ..
user@a82e2c1af21a:/kpconv$ cd cpp_wrappers/
user@a82e2c1af21a:/kpconv/cpp_wrappers$ sh compile_wrappers.sh
running build_ext
building 'grid_subsampling' extension
<Redacted for brevity>
user@a82e2c1af21a:/kpconv/cpp_wrappers$ cd ..
user@a82e2c1af21a:/kpconv$ python training_ModelNet40.py
I receive the following Python ImportError
user@a82e2c1af21a:/kpconv$ python training_ModelNet40.py
Traceback (most recent call last):
File "training_ModelNet40.py", line 36, in <module>
from datasets.ModelNet40 import ModelNet40Dataset
File "/kpconv/datasets/ModelNet40.py", line 40, in <module>
from datasets.common import Dataset
File "/kpconv/datasets/common.py", line 29, in <module>
import cpp_wrappers.cpp_subsampling.grid_subsampling as cpp_subsampling
ImportError: /kpconv/cpp_wrappers/cpp_subsampling/grid_subsampling.cpython-35m-x86_64-linux-gnu.so: failed to map segment from shared object
Why is this Python wrapper for C++ only useable when COPY'ing code into the docker image and not when mounted by volume?
This .so
file has the same hash and permissions as the first described situation
user@a82e2c1af21a:/kpconv/cpp_wrappers/cpp_subsampling$ sha1sum grid_subsampling.cpython-35m-x86_64-linux-gnu.so
a17eef453f6d2370a15bc2a0e6714c978390c5c3 grid_subsampling.cpython-35m-x86_64-linux-gnu.so
user@a82e2c1af21a:/kpconv/cpp_wrappers/cpp_subsampling$ ls -al grid_subsampling.cpython-35m-x86_64-linux-gnu.so
-rwxr-xr-x 1 user user 561056 Mar 14 02:19 grid_subsampling.cpython-35m-x86_64-linux-gnu.so
On my host machine the file has the following permissions (it's on the host because /kpconv
was mounted as a volume) (for some reason the container is in the future too, check the timestamps)
$ ls -al grid_subsampling.cpython-35m-x86_64-linux-gnu.so
-rwxr-xr-x 1 <myusername> <myusername> 561056 Mar 13 21:19 grid_subsampling.cpython-35m-x86_64-linux-gnu.so
After some research on the error message it looks like every result is specific to a situation. Though most seem to mention that the error is the result of some sort of permissions issue.
This Unix&Linux Stack answer I think provides the answer to what the actual problem is. But I am a bit too far from my days of working with C++ as an intern in college to necessarily understand how to use it to fix this issue. But I think the issue lies with the permissions between the container and host and between the users on each (that is, root on the container, user
(Dockerfile) on the container, root on host, and <myusername>
on host).
I have also attempted to first elevate permissions within the container using the root password created in the Dockerfile, then compiling the code, and running the software. But this results in the same issue. I have also tried compiling the code as user
in the container, but running the software as root, again with the same issue.
Thus another clue I have found and provide is that there is seemingly something different with the .so
when compiled "only within" the container (no --volume
) and when it is compiled within the --volume
(thus why I attempted to compare the file hashes). So maybe its not so much permissions but how the .so
is loaded within the container by the kernel or how its location within the --volume
effects that loading process?
EDIT: As for a SSCCE you should be able to clone the linked repository to your machine and use the same Dockerfile. You do not need to specify the /data
or /output
volumes or alter the code in any way (It attempts to load the .so
before loading the data (which will just error and end execution))
If you do not have a GPU or do not want to install nvidia-runtime
you should be able to alter the Dockerfile base image to tensorflow:1.12.0-devel-py3
and run the code on CPU.