In theano, it was very easy to get the gradient of some variable w.r.t. a given loss:
loss = f(x, w)
dl_dw = tt.grad(loss, wrt=w)
I get that pytorch goes by a different paradigm, where you'd do something like:
loss = f(x, w)
loss.backwards()
dl_dw = w.grad
The thing is I might not want to do a full backwards propagation through the graph - just along the path needed to get to w.
I know you can define Variables with requires_grad=False
if you don't want to backpropagate through them. But then you have to decide that at the time of variable-creation (and the requires_grad=False
property is attached to the variable, rather than the call which gets the gradient, which seems odd).
My Question is is there some way to backpropagate on demand (i.e. only backpropagate along the path needed to compute dl_dw
, as you would in theano)?