I implemented a combination of MLP, RNN, CNN. With a batch size of 420, everything seems to work fine (aka I dont get any errors). However as soon as I increase the batch to 840, I receive the following error:
Traceback (most recent call last):
File "train_cnn_rnn.py", line 152, in <module>
loss.backward()
File "/home/tbaumgae/.local/lib/python3.5/site-packages/torch/autograd/variable.py", line 146, in backward
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
The forward pass seems to work fine. I check all the variables whether they are contiguous and they are. Also my prediction and target for the loss calculation are contiguous and also the returned loss. But then this error occurs when calling backward()
. Any ideas why this would occur?
CUDA Version 8.0.61
Python 3.5.2
Comment Summary:
- There are 210 images in one sequence, therefore, my batch size is in steps of 210. Each image as a shape of
[3, 250, 250]
. - I'm using the PyTorch backward, haven't implemented any backward method myself.