Gradient descent algorithm, should I normalize paramethers too?

Question

I have a few doubts about normalization and gradient descent that I couldn't figure out:

Should I normalize the paramethers apart from the samples?
If I normalize the paramethers before executing the gradient descent, should I desnormalize the result paramethers too?

Thank you in advance.

score 1 · Answer 1 · answered Sep 08 '19 at 20:42

The parameters that you want to train in your model are usually initialized before the running gradient descent.

If you are using a framework like pytorch or tensorflow, there will be a module called something like "init" that has methods to initialize the parameters. The parameters can safely be drawn from a normal distribution but many other distributions can be used.

The output of the model will usually not correspond to "real" quantities (unless you want to do what is called "regression"). Often you will want to output something like the probability of belonging to some class (say: dog or cat or lion). In that case the output elements should then be values between 0 and 1. This is often achieved with a so-called softmax-layer.

Gradient descent algorithm, should I normalize paramethers too?

1 Answers1