I have a loss function and a list of weightmatrices and I'm trying to compute second derivatives. Here's a code snippet:
loss.backward(retain_graph=True)
grad_params_w=torch.autograd.grad(loss, weight_list,create_graph=True)
for i in range(layers[a]):
for j in range (layers[a+1]):
second_der=torch.autograd.grad(grad_params_w[a][i,j], my_weight_list[b], create_graph=True)
The above code works (actually the second derivative is called in a seperate function but I put it directly for the sake of brevity). But I am completely confused as to when to use create and retain graph.
First: If I don't do loss.backward(retain_graph) I get the error A:
RuntimeError: element 0 of variables tuple is volatile
If I use it, but don't put any "graph" statement on the first derivative, I get the error B:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
If I specify retain_graph=True, I get error A for the second derivative (i.e. in the for loops) no matter if I put a create graph statement there or not.
Hence, only the above snippet works but it feels weird that I need loss.backward and all the create graph statement. Could somebody clarify this to me? Thanks a lot in advance!!