0

I am trying to use Adam optimizer to obtain certain values outside of a neural network. My technique wasn't working so I created a simple example to see if it works:

a = np.array([[0.0,1.0,2.0,3.0,4.0], [0.0,1.0,2.0,3.0,4.0]])
b = np.array([[0.1,0.2,0.0,0.0,0.0], [0.0,0.5,0.0,0.0,0.0]])
a = torch.from_numpy(a)
b = torch.from_numpy(b)
a.requires_grad = True
b.requires_grad = True
optimizer = torch.optim.Adam(
        [b],
        lr=0.01,
        weight_decay=0.001
    )

iterations = 200
for i in range(iterations ):
    loss = torch.sqrt(((a.detach() - b.detach()) ** 2).sum(1)).mean()
    loss.requires_grad = True 
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if i % 10 == 0:
        print(b)
        print("loss:", loss)

My intuition was b should get close to a as much as possible to reduce loss. But I see no change in any of the values of b and loss stays exactly the same. What am I missing here? Thanks.

UlucSahin
  • 85
  • 1
  • 6

1 Answers1

1

You are detaching b, meaning the gradient won't flow all the way to b when backpropagating, i.e. b won't change! Additionally, you don't need to state requires_grad = True on the loss, as this is done automatically since one of the operands has the requires_grad flag on.

loss = torch.sqrt(((a.detach() - b) ** 2).sum(1)).mean()
Ivan
  • 34,531
  • 8
  • 55
  • 100
  • This solved the problem, thanks. I believe I got confused after not having requires_grad on loss initially, then set requires_grad on every parameter to True, which created this issue. – UlucSahin Jul 28 '21 at 14:30
  • 1
    In this minimal example, you may consider `a` as your ground truth, as such it doesn't need to compute a gradient. So you only need `b.requires_grad = True`. Do not the flag only takes effect for operations that appear after it being set. So `loss.requires_grad = True` actually has no effect, since you computed its value on the line before. – Ivan Jul 28 '21 at 14:36