0

I already have experience with deep learning. I use python and cuda works fine with my GPU when training a model. BUT, now I need to use Java (for an optional school project I want to create a reinforcement learning AI and I have to use Java). I'm completely new to Java so I followed this video which is based on the quickstart guide of the DL4J website. When it comes to dowloading the examples, everything works fine : image.

When I run examples on cpu (with neural nets) everythings works fine. But, when I try this one "MultiGpuLenetMnistExample" (it's in the "dl4j-cuda-specific-examples" folder), I get the following error and I tried to look for answers but I didn't find what I wanted (or maybe didn't understand the answers). I guess the problem comes from the nd4j backend or something with jcublas but I don't know what to do with that.

Consider that I am not comfortable yet with all the subtilities of Java, when I looked for people with the same issue I didn't understand what they were talking about, I just discovered the concept of pom.xml files for example... But I've seen that people answering were usualy asking for the java -version, mvn --version or nvcc --version so here they are.

plguillou
  • 38
  • 4

1 Answers1

1

According to your screenshots, you're trying to run project with deps for CUDA 10.2, but you have CUDA 10.0 installed. Change your dependency to nd4j-cuda-10.0 instead of nd4j-cuda-10.2

raver119
  • 336
  • 1
  • 5
  • I chnaged that DeepLearning4j CUDA special examples nd4j-cuda-10.2-platform int this DeepLearning4j CUDA special examples nd4j-cuda-10.0-platform – plguillou Jan 01 '20 at 17:12
  • And in the dependencies (in the pom file) they used cuda 9.2, 10.0, 10.1 and 10.2 so I removed 9.2, 10.1 and 10.2 but it still doesnt work – plguillou Jan 01 '20 at 17:15
  • That's not how you specify a dependency in a POM file, search "specifying dependencies in Maven" – Trash Can Jan 01 '20 at 17:35
  • It works ! It runs and I get the confusion matrix and all ... but i get this message in the initialization "cuDNN not found: use cuDNN for better GPU performance by including the deeplearning4j-cuda module. For more information, please refer to: [link](https://deeplearning4j.org/docs/latest/deeplearning4j-config-cudnn)" I don't understand because I did what they say in the link but I still get this "cuDNN not found" message – plguillou Jan 01 '20 at 23:10
  • That's just a hint. If you have cuDNN installed, you can add "deeplearning4j-cuda-10.x" dependency, and improve performance of selected ops supported by cuDNN. – raver119 Jan 02 '20 at 09:56
  • @raver119 I've seen your name in lots of dl4j projects (especially this "MultiGpuLenetMnistExample") thanks for all your work ^^ ! I think I'm pretty close to get it done I just have this kind of [error](https://snipboard.io/baLuwS.jpg) messages when running the program (in the long line the message in french says "The specified procedure could not be found"). – plguillou Jan 02 '20 at 10:47
  • And also it's strange because when I do nvcc --version it tells me I have cuda 10.0 and with nvidia-smi it tells me I have 10.1 ... And I tried with both dependecies and I get the same error (in the last link) – plguillou Jan 02 '20 at 10:59
  • You either have multiple CUDA versions installed, or that's just driver reporting wrong version. In any way - that's pure versions clash, and the real way to get it fixed - to sort out what you actually have installed – raver119 Jan 03 '20 at 11:12
  • Message you've posted above is exactly it - you have multiple versions of CUDA-related stuff installed, but it's binary incompatible. I.e. cuDNN for 10.0 isn't compatible with CUDA 10.2 – raver119 Jan 03 '20 at 11:13