3

I am currently learning for myself the concept of neural networks and I am working with the very good pdf from http://neuralnetworksanddeeplearning.com/chap1.html

There are also few exercises I did, but there is one exercise I really dont understand, at least one step

Task: There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01. enter image description here

I found also the solution, as can be seen on the second image enter image description here

I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates

0.99 + 3*0.01
4*0.01

I really don't understand these two steps. I would be very happy if someone can help me understand this calculation

Thank you very much for help

SMS
  • 348
  • 2
  • 13

1 Answers1

2

Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:

  • x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.

  • Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:

0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03

So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.

viceriel
  • 835
  • 12
  • 18
  • woow thank you very very much :) This is really great. – SMS Jul 01 '19 at 08:02
  • I have an additional question, and just to be sure. I know the equation for the perceptron is w*x+b<=0 --> 0 or w*x + b >0 --1 What I learned is that, b is a scalar, the bias and W the weight matrix and x the input, so a classical matrix vector multiplication. However, if I multiply w and x I do get a vector. But then I cannot add the bias since it is a scalar. Do I understand the concept in a wrong way or is this not a matrix vector multipliation? – SMS Jul 01 '19 at 08:29
  • Not sure if we use same terminology. Original perceptron was one neuron. This perceptron have weight vector(no matrix because only one unit) and bias(scalar because one unit). But if you use layers where is more than one unit and you use bias, than bias could be vector with shape number_of_unitsx1, because every neuron have bias. number_of_unitsx1 is also shape of Wx multiplication so you can perform addition between bias vector and Wx result. – viceriel Jul 01 '19 at 15:07