1

I'm learning AdaDelta in "Dive Into Deep Learning".I notice that the writer use the s[:] and delta[:] in the function "adadelta".

At first ,I think using s and delta is ok,I change the left part of #1line and #2 line(s[:] -> s,delta[:] -> delta),but the result is very different(I have already tried many times,and use a manual_seed,the result curve using s and delta is more tortuous).I really want to know the difference between them.

The shapes of s using s[:] or s to do value-assignment above are the same.

I also notice that if I use s.data and delta.data to replace s[:] and delta[:] and change the code where it needs.It works like s[:] and delta[:].

I sincerely hope you can help me if you see this problem and understand it.Thanks!

def adadelta(params, states, hyperparams):
    rho, eps = hyperparams['rho'], 1e-5
    for p, (s, delta) in zip(params, states):
        s[:] = rho * s + (1 - rho) * (p.grad.data**2)# 1
        g =  p.grad.data * torch.sqrt((delta + eps) / (s + eps))
        p.data -= g
        delta[:] = rho * delta + (1 - rho) * g * g# 2

the result of using s[:] and delta[:]

the result of using s and delta

XUHAO77
  • 11
  • 3
  • I find the answer in [what-is-the-difference-between-tensor-and-tensor-in-pytorch](https://stackoverflow.com/questions/61103275/what-is-the-difference-between-tensor-and-tensor-in-pytorch) – XUHAO77 Jul 23 '22 at 14:41

0 Answers0