0

In sklearn's tSNE implementation, the gradient update is done as follows (gradient_descent function in _t_sne.py on sklearn's github):

    error, grad = objective(p, *args, **kwargs)
    grad_norm = linalg.norm(grad)

    inc = update * grad < 0.0
    dec = np.invert(inc)
    gains[inc] += 0.2
    gains[dec] *= 0.8
    np.clip(gains, min_gain, np.inf, out=gains)
    grad *= gains
    update = momentum * update - learning_rate * grad
    p += update

What is unclear to me is where the += 0.2 and *= 0.8 come from. I couldn't find anything in the original t-SNE paper and I can't reconcile the updates in the sklearn implementation with the update formula in the paper: tSNE paper gradient update

Does anybody know the logic behind the implementation or how I can reconcile the two?

Thanks in advance.

futuref
  • 1
  • 1
  • The original tSNE paper mentioned origin of the this update rule: it is basically a way to dynamically increase (+=0.2) or decrease(*=0.8) the learning rate. It is simple and worked well in most cases. – James LI May 19 '22 at 22:54
  • Hi James, thanks for your answer! Could you point me to where in the paper this is mentioned? They mention some optimisation tricks in section 3.4 and reference Jacobs 1988 - is that it? – futuref May 21 '22 at 07:21
  • Yes, that's it. The adaptive learning rate basically make the choice of initial learning irrelevant. – James LI May 22 '22 at 01:54

0 Answers0