I'm learning AdaDelta in "Dive Into Deep Learning".I notice that the writer use the s[:] and delta[:] in the function "adadelta".
At first ,I think using s and delta is ok,I change the left part of #1line and #2 line(s[:] -> s,delta[:] -> delta),but the result is very different(I have already tried many times,and use a manual_seed,the result curve using s and delta is more tortuous).I really want to know the difference between them.
The shapes of s using s[:] or s to do value-assignment above are the same.
I also notice that if I use s.data and delta.data to replace s[:] and delta[:] and change the code where it needs.It works like s[:] and delta[:].
I sincerely hope you can help me if you see this problem and understand it.Thanks!
def adadelta(params, states, hyperparams):
rho, eps = hyperparams['rho'], 1e-5
for p, (s, delta) in zip(params, states):
s[:] = rho * s + (1 - rho) * (p.grad.data**2)# 1
g = p.grad.data * torch.sqrt((delta + eps) / (s + eps))
p.data -= g
delta[:] = rho * delta + (1 - rho) * g * g# 2