I have 1 input layer, 2 hidden layers and 1 output layer and for a single training example x with output y I have computed following :
x = [1;0;1];
y = [1;1;1];
theta1 =
4.7300 3.2800 1.4600
0 0 0
4.7300 3.2800 1.4600
theta2 =
8.8920 8.8920 8.8920
6.1670 6.1670 6.1670
2.7450 2.7450 2.7450
theta3 =
9.4460 6.5500 2.9160
9.3510 6.4850 2.8860
8.8360 6.1270 2.7270
theta1 controls mapping between input layer and layer1 theta2 controls mapping between layer1 and layer2 theta3 controls mapping between layer2 and output layer
But to compute gradient descent using :
theta(i) = theta(i) - (alpha/m .* (x .* theta(i)-y)' * x)'
where i = 1 or 2 or 3 the dimensions of x and y are incorrect. The dimensions are correct (by correct I mean can execute the theta calculation without an error) if x and y are 1x9 instead of 1x3. Do I need to change the architecture of my neural network or can I just set x and y to
x = [1;0;1;0;0;0;0;0;0];
y = [1;1;1;0;0;0;0;0;0]; so that the matrix multiplication works out ?
Update :
alpha=learning rate (.00001)
m=number of training examples (1)
theta(i) refers to theta1,theta2,theta3
I'm using vectorised gradient descent as described at Vectorization of a gradient descent code
Update2 :
This matlab code appears to work :
m = 1;
alpha = .0000001;
x = [1;0;1];
y = [1; 1; 1];
theta1 = [4.7300 3.2800 1.4600; 0 0 0; 4.7300 3.2800 1.4600];
theta1 = theta1 - (alpha/m) * (x' * (theta1 * x - y));
is theta1 = theta1 - (alpha/m) * (x' * (theta1 * x - y));
correct implementation of vectorised gradient descent ?
I understand this will cause issues for unrolling theta matrices to theta vectors as the dimensions will not be same but for working with theta matrices instead of theta vectors is this correct ?
Update :
Formula is modified from Vectorization of a gradient descent code
where gradient descent is given as : theta = theta - (alpha/m) * (X' * (X*theta-y));
I changed it to theta = theta - (alpha/m) * (x' * (theta * x - y));
, so (X*theta-y);
changed to (theta * x - y);