I have bunch of questions about the way regularization and biased are working in caffe.
First, by default biased exist in the network, is it right? Or, I need to ask caffe to add them?
Second, when it obtains the loss value, it does not consider the regularization. is it right? I mean the loss just contains the loss function value. As I understood, it just considers regularization in the gradient calculation. Is it right?
Third, when caffe obtains the gradient, does it consider the biased value in the regularization as well? Or does it just consider the weight of the network in the regularization?
Thanks in advance,
Afshin