6

I'm using a python:3.7.4-slim-buster docker image and I can't change it. I'm wondering how to use my nvidia gpus on it.

I usually used a tensorflow/tensorflow:1.14.0-gpu-py3 and with a simple --runtime=nvidia int the docker run command everything worked fine, but now I have this constraint.

I think that no shortcut exists on this type of image so I was following this guide https://towardsdatascience.com/how-to-properly-use-the-gpu-within-a-docker-container-4c699c78c6d1, building the Dockerfile it proposes:

FROM python:3.7.4-slim-buster

RUN apt-get update && apt-get install -y build-essential
RUN apt-get --purge remove -y nvidia*
ADD ./Downloads/nvidia_installers /tmp/nvidia                             > Get the install files you used to install CUDA and the NVIDIA drivers on your host
RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module   > Install the driver.
RUN rm -rf /tmp/selfgz7                                                   > For some reason the driver installer left temp files when used during a docker build (i dont have any explanation why) and the CUDA installer will fail if there still there so we delete them.
RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt            > CUDA driver installer.
RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0   > CUDA samples comment if you dont want them.
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64         > Add CUDA library into your PATH
RUN touch /etc/ld.so.conf.d/cuda.conf                                     > Update the ld.so.conf.d directory
RUN rm -rf /temp/*  > Delete installer files.

But it raises an error:

ADD failed: stat /var/lib/docker/tmp/docker-builder080208872/Downloads/nvidia_installers: no such file or directory

What can I change to easily let the docker image see my gpus?

Paolo Magnani
  • 549
  • 4
  • 14
  • 2
    Base your Dockerfile on the [nvidia-docker image](https://github.com/NVIDIA/nvidia-docker). You need to have the cuda driver installed on the host. – sim Jan 12 '21 at 15:13
  • You can't change `python:3.7.4-slim-buster` because you need this specific python version, right? – anemyte Jan 12 '21 at 15:14
  • @anemyte yes at least I should not use tensorflow/pytorch images. It should be better to mantain this one. Why? Have you an idea? – Paolo Magnani Jan 12 '21 at 17:01
  • 1
    I have a Dockerfile with python3.7 and CUDA, based on Ubuntu. I used it to create a custom Tensorflow image for my needs. If that fit your needs I'll post it as an answer. – anemyte Jan 12 '21 at 17:39
  • @sim thank you too. Can you give me a Dockerfile example keeping fixed my image? – Paolo Magnani Jan 13 '21 at 09:30

1 Answers1

8

TensorFlow image split into several 'partial' Dockerfiles. One of them contains all dependencies TensorFlow needs to operate on GPU. Using it you can easily create a custom image, you only need to change default python to whatever version you need. This seem to me a much easier job than bringing NVIDIA's stuff into Debian image (which AFAIK is not officially supported for CUDA and/or cuDNN).

Here's the Dockerfile:

# TensorFlow image base written by TensorFlow authors.
# Source: https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/tools/dockerfiles/partials/ubuntu/nvidia.partial.Dockerfile
# -------------------------------------------------------------------------
ARG ARCH=
ARG CUDA=10.1
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH
ARG CUDA
ARG CUDNN=7.6.4.38-1
ARG CUDNN_MAJOR_VERSION=7
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=6.0.1-1
ARG LIBNVINFER_MAJOR_VERSION=6

# Needed for string substitution
SHELL ["/bin/bash", "-c"]
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cuda-command-line-tools-${CUDA/./-} \
        # There appears to be a regression in libcublas10=10.2.2.89-1 which
        # prevents cublas from initializing in TF. See
        # https://github.com/tensorflow/tensorflow/issues/9489#issuecomment-562394257
        libcublas10=10.2.1.243-1 \ 
        cuda-nvrtc-${CUDA/./-} \
        cuda-cufft-${CUDA/./-} \
        cuda-curand-${CUDA/./-} \
        cuda-cusolver-${CUDA/./-} \
        cuda-cusparse-${CUDA/./-} \
        curl \
        libcudnn7=${CUDNN}+cuda${CUDA} \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libzmq3-dev \
        pkg-config \
        software-properties-common \
        unzip

# Install TensorRT if not building for PowerPC
RUN [[ "${ARCH}" = "ppc64le" ]] || { apt-get update && \
        apt-get install -y --no-install-recommends libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
        libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
        && apt-get clean \
        && rm -rf /var/lib/apt/lists/*; }

# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
    && echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
    && ldconfig
# -------------------------------------------------------------------------
#
# Custom part
FROM base
ARG PYTHON_VERSION=3.7

RUN apt-get update && apt-get install -y --no-install-recommends --no-install-suggests \
          python${PYTHON_VERSION} \
          python3-pip \
          python${PYTHON_VERSION}-dev \
# Change default python
    && cd /usr/bin \
    && ln -sf python${PYTHON_VERSION}         python3 \
    && ln -sf python${PYTHON_VERSION}m        python3m \
    && ln -sf python${PYTHON_VERSION}-config  python3-config \
    && ln -sf python${PYTHON_VERSION}m-config python3m-config \
    && ln -sf python3                         /usr/bin/python \
# Update pip and add common packages
    && python -m pip install --upgrade pip \
    && python -m pip install --upgrade \
        setuptools \
        wheel \
        six \
# Cleanup
    && apt-get clean \
    && rm -rf $HOME/.cache/pip

You can take from here: change python version to one you need (and which is available in Ubuntu repositories), add packages, code, etc.

anemyte
  • 17,618
  • 1
  • 24
  • 45
  • 1
    This really helped with my solution. The only thing I changed is that I copied the official python Dockerfile code for the python part. Call-out that if some other libraries that utilize GPU will require the "devel" version of the TensorFlow Dockerfile (instead of the base version above) to function, such as the Implicit library. – ZaxR May 26 '21 at 17:56