-1

To implement L2 regularization for logistic regression we add L2 norm to the base loss: enter image description here

With multilayer Neural Networks we do the same, but additionally, we increase per loss weight derivative of the weight during backward propagation: enter image description here

The question is: why don't we just do the same for NN?

I can guess that it is connected with the fact that NN have multile layers, but I do not understand how and why it is working.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory. – desertnaut May 30 '21 at 15:25

1 Answers1

1

As I know the basic approach is to give a penalty in empirical rik minimization problem, so maybe the other penalty comes from other theoretical result which I don't know. If you know to take a look in a theoretical aspects of ML I strong reccommend you this book https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf.