0

I have just started studying nueral networks and I managed to figure out how to derive the equations necessary for back propagation. I've spent nearly 3 days asking all of my professors and googling everything I can find. My math skills are admittedly poor but I really want to understand how this particular formula mathematically makes sense. The formula is used to update the weight after the gradient has already been found.

W1 = W0 - L * (dC/dw)

Where:

W1 = new weight

W0 = old weight

L = learning rate

dC/dw = the partial derivative of error function and a member of the gradient vector of the Cost function

What I know so far:

  1. The gradient is a vector of it's partial derivatives and the maximum rate of increase is given by the gradient itself. Each partial derivative gives the maximum rate of change in the direction that the derivative is taken with respect to.
  2. dC/dW is one of these partial derivatives.
  3. dC/dW evaluates to a rate of change. It's sign can tell us the direction of change. The value itself is the proportion between change in Cost and change in weight at a particular weight.
  4. Somehow multiplying dC/dW by the learning rate is only taking a small portion of this rate as the change in weight.

What I can't reconcile:

  1. The learning rate is just a scalar without units. How is it possible to just multiply a scalar by a rate and end up with a measurable change in weight? What am I failing to understand here?
halfer
  • 19,824
  • 17
  • 99
  • 186
  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Feb 28 '20 at 19:57

1 Answers1

-1

Artificial neural networks(ANN) are based on the concept taken from Human nervous system . The basic unit of human nervous system is neuron. To sense a stimulus these neurons are present in the whole body and each neuron is connected with the other neuron, in-order to transmit the message form that part of body to brain.Signal transmission by the neurons is controlled by the concentration of certain chemicals present in the neuron. The concentration of there chemicals remain in a balanced state normally and it does not got disturb until a stimulus is sensed. Hence, does not transmit a signal to other neuron unless there is a stimulus. However, when a stimulus is sensed (e.g a person got his figure cut from its tip, a stimulus is sensed at the figure tip by the neurons present over there), the concentration of chemicals on the surface of a neuron got increased and signal is transmitted to the other neuron. The nature of signal and message encoded inside depends upon the concentration of change in chemicals.

In ANN a neuron is mathematical function or formula, and weights of a neuron are similar to the level of chemical concentration in a human neuron. The weights should be adjusted so that a fix formula can encode all the information to perform all the desired predictions as, encoded in human neuron through concentration of chemical. In order to find out the correct weights ANN is trained by huge data for the problem for which ANN is being trained.

The learning rate is a scalar that usually varies between 0 and 1, both inclusive. Simply learning rate define the pace to update the weight. The derivative is rate of change between two values. Here, in this case the two points are the predicted values and the real values. For, (dC/dw) you can simply use a cost function as well, it is also known as responsibility of that very neuron in the error in whole network. The formula may varies form layer to layer and text to text as well. here is a link that well explains the fed forward neural network structure in detail. hope you will understand it.If you are still confuse you may ask further.

Sana
  • 71
  • 1
  • 2
  • 9