0

The problem is new and has never happened before, so there might have been an update of the nvidia driver or libtorch. Problem: I am using Google Colab for additional GPU and want to install a programm, that needs libtorch. So, installing was working fine the last couple of weeks, however, starting from today, the program cannot be installed. I already tried to restart several times, reboot etc. and nothing seems to work. I also downloaded the new libtorch version for cuda 11.3 and updated cuda, so that the runtime runs on cuda 11.3. When I call

    !nvidia-smi

it gives out the information as usual. Nevertheless, after adding libtorch as environment variable as needed in order to use libtorch using

    os.environ['LIBTORCH'] = "/content/libtorch" 

and

    os.environ['LD_LIBRARY_PATH'] = "/content/libtorch/lib" 
    !nvidia-smi

suddenly displays "Failed to initialize NVML: Driver/library version mismatch". And since this is happening, I cannot install the program anymore.

So, I install rustc (since the program require rustup) and add it to the path with

    os.environ['PATH] += os.pathsep + "path/to/.cargo/bin" 

I add Libtorch as environment variable. I try to cargo-install the program. It usually worked fine, now it fails, throwing the error message:

error: linking with `cc` failed: exit status: 1
  
  = note: "cc" "-m64" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-Wl,--as-needed" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" 
.........................................
  = note: /usr/bin/ld: cannot find -ltorch_cuda
          /usr/bin/ld: cannot find -ltorch_cuda_cu
          /usr/bin/ld: cannot find -ltorch_cuda_cpp
          /usr/bin/ld: cannot find -ltorch_cpu
          /usr/bin/ld: cannot find -ltorch
          /usr/bin/ld: cannot find -lc10
          collect2: error: ld returned 1 exit status
kmdreko
  • 42,554
  • 6
  • 57
  • 106
CanDo7777
  • 1
  • 1

0 Answers0