0

In other words, what is the main reason from switching the bias to a b_j or to an additional w_ij*x_i in the neuron summation formula before the sigmoid? Performance? Which method is the best and why?

Note: j is a neuron of the actual layer and i a neuron of a lower layer.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79

1 Answers1

1

Note: it makes little sense to ask for the best method here. Those are two different mathematical notations for exactly the same thing.

However, fitting the bias as just another weight allows you to rewrite the sum as a scalar product of an observed feature vector x_d with the weight vector w.

Have you tried to calculate the derivate w.r.t w in order to get the optimal w according to least squares? You will notice that this calculation becomes much cleaner in a vectorized notation.

Apart from that: In many high level programming languages vectorized calculations are significantly more efficient than the non-vectorized equivalent. So performance is also point, at least in some languages.

cel
  • 30,017
  • 18
  • 97
  • 117
  • When changing the bias into an additional neuron (for performance to do a scalar product), do we lose the individual neurons bias information to share the same bias across the layer of neurons? – Guillaume Chevalier May 03 '15 at 20:20
  • Oh, I see, the external bias neuron does not learn a value, it's weight is adjusted for every neuron of the layer so no information is lost, thanks. Maybe you want to edit your answer to add that! – Guillaume Chevalier May 03 '15 at 20:21
  • You don't change the bias into a neuron, you make the bias part of the weights. Basically you add to each neuron an additional constant input of 1 which is associated to an additional weight. If you do the calculations you will see that this is equivalent to having a bias term for each neuron. – cel May 03 '15 at 20:24
  • Yes, that's it (by "changing the bias into an additional neuron", I meant to externalize it into a weighted neuron with a static value of 1). So are each weights to this neuron the "bias" of their linked neuron. Thanks – Guillaume Chevalier May 03 '15 at 20:31