6

The Goal:

To debug a Python application in PyCharm, where I set the interpreter to a custom docker image, using Tensorflow and so requiring a GPU. The problem is that PyCharm's command-building doesn't offer a way to discover available GPUs, as far as I can tell.

Terminal - it works:

Enter a container with the following command, specifying which GPUs to make available (--gpus):

docker run -it --rm --gpus=all --entrypoint="/bin/bash" 3b6d609a5189        # image has an entrypoint, so I overwrite it

Inside the container, I can run nvidia-smi to see a GPU is found, and confirm Tensorflow finds it, using:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()
# physical_device_desc: "device: 0, name: Quadro P2000, pci bus id: 0000:01:00.0, compute capability: 6.1"]

If I don't use the --gpus flag, no GPUs are discovered, as expected. Note: using docker version 19.03 and above, Nvidia runtimes are supports natively, so there is no need for nvidia-docker and also, the docker-run argument --runtime=nvidia is also deprecated. Relevant thread.

PyCharm - it doesn't work

Here is the configuration for the run:

configuration

(I realise some of those paths might look incorrect, but that isn't an issue for now)

I set the interpreter to point to the same docker image and run the Python script, set a custom LD_LIBRARY_PATH as an argument to the run that matches where the libcuda.so is located in the docker image (I found it interactively inside a running container), but still no device is found:

error message

The error message shows the the CUDA library was able to be loaded (i.e. is was found on that LD_LIBRARY_PATH), but the device was still not found. This is why I believe the docker run argument --gpus=all must be set somewhere. I can't find a way to do that in PyCharm.

Other things I have tried:

  1. In PyCharm, using a Docker execution template config (instead of a Python template) where it is possible to specify run arguments, so I hoped to pass --gpus=all, but that seems not to be supported by the parser of those options:

parse error

  1. I tried to set the default runtime to be nvidia in the docker daemon by including the following config in /etc/docker/daemon.json:
{
    "runtimes": {
        "nvidia": {
            "runtimeArgs": ["gpus=all"]
        }
    }
}

I am not sure of the correct format for this, however. I have tried a few variants of the above, but nothing got the GPUs recognised. The example above could at least be parsed and allow me to restart the docker daemon without errors.

  1. I noticed in the official Tensorflow docker images, they install a package (via apt install) called nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0, which sounds like a great tool, albeit seemingly just for TensorRT. I added it to my Dockerfile as a shot in the dark, but unfortunately it did not fix the issue.

  2. Adding NVIDIA_VISIBLE_DEVICES=all etc. to the environment variables of the PyCharm configuration, with no luck.

I am using Python 3.6, PyCharm Professional 2019.3 and Docker 19.03.

n1k31t4
  • 2,745
  • 2
  • 24
  • 38

3 Answers3

5

Docker GPUs' support is now available in PyCharm 2020.2 without global default-runtime. Just set --gpus all under 'Docker container settings' section in the configuration window.

If no NVIDIA GPU device is present: /dev/nvidia0 does not exist error still occur, make sure to uncheck Run with Python Console, because it's still not working properly.

Michał De
  • 51
  • 1
  • 2
  • Ridiculous how this tiny thing `Run with Python Console` just simply breaks input parameters – Ufos Sep 27 '21 at 19:16
3

It turns out that attempt 2. in the "Other things I tried" section of my post was the right direction, and using the following allowed PyCharm's remote interpreter (the docker image) locate the GPU, as the Terminal was able to.

I added the following into /etc/docker/daemon.json:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

It is also necessary to restart the docker service after saving the file:

sudo service docker restart

Note: that kills all running docker containers on the system

n1k31t4
  • 2,745
  • 2
  • 24
  • 38
1

Check out Michał De's answer, it works. However, an interactive console is still broken. With some docker inspect I figured out that using the option Run with Python Console overwrites docker config ignoring provided options --gpus all. I couldn't stand such a loss in quality of life and forced pycharm to play nice using docker-compose.

Behold, the WORKAROUND.


1. How to test GPU in Tensorflow

import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))

should return something like

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

2. Make sure you have a simple docker container that works

docker pull tensorflow/tensorflow:latest-gpu-jupyter
docker run --gpus all -it tensorflow/tensorflow:latest-gpu-jupyter python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

the last print should be as described in step 1. Otherwise see nvidia guide or tensorflow guide.


3. Create a compose file and test it

version: '3'
# ^ fixes another pycharm bug
services:
  test:
    image: tensorflow/tensorflow:latest-gpu-jupyter  
    # ^ or your own
    command: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  
    # ^ irrelevant, will be overwridden by pycharm, but usefull for testing
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
docker-compose --file your_compose_file up

Again, you should see the same output as described in the step 1. Given that step 2 was successful this should go without surprises.


4. Set up this compose as an interpreter in pycharm

  • Configuration files: your_compose_file
  • Service: test (it just works, but you can have more fun )

enter image description here


5. Enjoy your interactive console while running a GPU enabled docker.

Ufos
  • 3,083
  • 2
  • 32
  • 36