Thresholds in backpropagation

Question

What is the use of thresholds in backpropagation algorithm. I wrote a java code for class label identification. I used some random thresholds (0-1) for the neurons. I trained the system and tested using some data.It worked pretty good. But what difference the algorithm makes with or without using Thresholds.

You mean "bias". There is always a threshold for the activation. Using bias units, you make the threshold(s) trainable. — runDOSrun, Jul 05 '15 at 10:52
Is there any method or formula (depending on other parameters) for how we generate random weights or just use random() function. — mRbOneS, Jul 06 '15 at 14:30

score 3 · Accepted Answer · answered Jul 05 '15 at 06:25

3

What you call "thresholds" are actually biases in affine transformations denoted by neurons

f(w,x,b) = g(<w,x>+b)

Biases should not be used as a constants as you suggest but rather - trained just as any other parameter in the network. Usually one simply adds a hypothethical "bias neuron" always equal to 1, and so bias becomes just another weight

f(w,x,b) = g(<[w b], [x 1]>)

Why is it important to have biases? In general having no bias means, that "filters" (feature detectors) trained in your neurons have to go through the origin. You can think about each neuron as a hyperplane in your input space around which you "fold" the space so your data becomes more separable. If you do not have biases - all these hyperplanes close at the origin. If you fix them as constants - you fix their distances from the origin. Finally if you train them - algorithm can freely place them in the input space (desired behaviour).

answered Jul 05 '15 at 06:25

lejlot

64,777
8
131
164

I didn't understand the terminology you were using.Can you please make it simple. I am a newbie and done this code just based on the algorithm and applied thresholds. What is bias here? And what is the function 'f' and 'g' you were using. If you have watsapp or facebook account please give me details, I have some doubts as I am doing this for publication( Journal). Thank you:) – mRbOneS Jul 05 '15 at 12:41
To be honest this is the simpliest terminology. You have a node, node which output is described as g(+b), where w are input weights, and b is "bias" (you call it threshold) and g is probably some sigmoid (or tanh, relu etc.) activation function. These biases are also models parameters, so you should train them, not fix - your training algorithm is probably based on gradient descent - you can compute derivatives over b too. – lejlot Jul 05 '15 at 19:57
Now I get it. And is it correct if we generate weights and thresholds randomly between -1 to +1. Is there any method or formula (depending on other parameters) for how we generate random weights or just use random() function. – mRbOneS Jul 06 '15 at 03:34
There are many formulas, one of which you can find in Heykins "Neural Networks and Learning Machines" and others in modern lpapers regarding deep learning. In general - initialization of weights seems crucial for neural networks donvergence – lejlot Jul 07 '15 at 04:04

Thresholds in backpropagation

1 Answers1