0

I'm having this error when executing my file. My system configurations are listed below.

OS: CentOS Linux 7
PyTorch 1.1.0
TensorFlow version: 1.2.0
Python version: 3.6.8
CUDA/cuDNN version: 8.0/7.0.5
GPU: Nvidia GPU GeForce GTX 1080

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [122,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [123,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [124,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [125,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [126,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [1426,0,0], thread: [127,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "./train.py", line 115, in <module>
    trainer.train()
  File "../../tasks/semantic/modules/trainer.py", line 239, in train
    show_scans=self.ARCH["train"]["show_scans"])
  File "../../tasks/semantic/modules/trainer.py", line 320, in train_epoch
    output = model(in_vol, proj_mask)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../../tasks/semantic/modules/segmentator.py", line 149, in forward
    y, skips = self.backbone(x)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "../..//backbones/darknet.py", line 171, in forward
    x, skips, os = self.run_layer(x, self.conv1, skips, os)
  File "../..//backbones/darknet.py", line 154, in run_layer
    y = layer(x)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/media-server/.pyenv/versions/anaconda3-5.0.0/envs/rangenet++/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

I tried entering sudo rm -rf ~/.nv and rebooting, but it didn't work. RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • As a workaround, you can try installing Anaconda and run your code there. But if you have time, you can reinstall CUDA after purging it from your system. – Rishab P Mar 31 '20 at 16:21

0 Answers0