torch.no_grad() affects on model accuracy

Question

I am getting an error "CUDA out of memory" then i add torch.no_grad() function into my code. Is it affect on my accuracy?

for iters in range(args.iterations):

with torch.no_grad():
    encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
    res, encoder_h_1, encoder_h_2, encoder_h_3)

with torch.no_grad():
    code = binarizer(encoded)

with torch.no_grad():
    output, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4 = decoder(
    code, decoder_h_1, decoder_h_2, decoder_h_3, decoder_h_4)

res = res - output.detach()
codes.append(code.data.cpu().numpy())
torch.cuda.empty_cache()
print('Iter: {:02d}; Loss: {:.06f}'.format(iters, res.data.abs().mean()))

Unless you're doing something very strange you should always be performing inference on your models within a `torch.no_grad()` context. Also, make sure your models are in "eval" mode to ensure dropout is disabled and batch-norm behaves properly (i.e. call `model.eval()` on each of your models before evaluating). — jodag, Aug 11 '20 at 07:36

score 5 · Answer 1 · answered Aug 11 '20 at 05:49

torch.no_grad() just disables the tracking of any calculations required to later calculate a gradient.

It won't have any effect on accuracy in a pure inference mode, since gradients are not needed there. Of course you can't use it during training time since we need the gradients to train and optimize.

In general if you go for inference you always want to set the network to eval mode and disable gradients. This saves run time and memory consumption and won't affect accuracy.

Answer to a similar questions, explaining eval() and no_grad() https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/2

score 1 · Answer 2 · answered Aug 11 '20 at 06:35

1

torch.no_grad() basically skips the gradient calculation over the weights. That means you are not changing any weight in the specified layers. If you are trainin pre-trained model, it's ok to use torch.no_grad() on all the layers except fully connected layer or classifier layer.

If you are training your network from scratch this isn't a good thing to do. You should consider to reduce layers or apply torch.no_grad() part of the training. An example for this is give below.

for iters in range(args.iterations):

if iters % 2 == 0:
    with torch.no_grad():
        encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
        res, encoder_h_1, encoder_h_2, encoder_h_3)
else:
    with torch.no_grad():
        encoded, encoder_h_1, encoder_h_2, encoder_h_3 = encoder(
        res, encoder_h_1, encoder_h_2, encoder_h_3)

This is a short example. It might make your training time a bit longer but you will be able to train your network without reducing layers. The important thing here that you shouldn't update all layer at each iteration or epoch. Some part of the network should be updated on a specified frequency. Note: This is an experimental method

answered Aug 11 '20 at 06:35

anlgrses

371
4
12

I just train a network and generated three models Encoder, Binarizer and Decoder. Now i am doing testing and used these three models for testing it uses encoder.py file to encode images and decodee.py file to decode images. If i didn't put with torch.no_grad(): in loop then it shows "CUDA out of memorr" – Khawar Islam Aug 11 '20 at 07:08
It's normal. I think your network huge for your GPU. Also check with `nvidia-smi` if there are processes that uses GPU and put your training into a risk – anlgrses Aug 11 '20 at 08:00
Training is going fine. While doing testing i got "CUDA out of memory issue" – Khawar Islam Aug 11 '20 at 08:04
Then you should definitely use `torch.no_grad()`. It won't affect your results. – anlgrses Aug 11 '20 at 08:09
i got very worst result. now i want to check each line – Khawar Islam Aug 11 '20 at 11:20

score 0 · Answer 3 · answered Aug 11 '20 at 04:01

According to the PyTorch docs:

Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.

So it depends on what you are planning to do. If you are training your model then yes it would affect your accuracy.

torch.no_grad() affects on model accuracy

3 Answers3