This is my first attempt at encoding a multilayer neural network in Python (code is attached below). I'm having a hard time trying to use the gradient descent partial derivatives, because it seems that the weights are not being updated properly. When I try to predict the output of a new sample, I always get the wrong answwer (there should be two output values and a probability related to them; for example: if a new sample belongs to class 1, its probability should be more than 0.5 (prob_class1), and thus class 2 has (1-prob_class1), but the code just yields [1,1] or [-1,-1] for any sample). I've double-checked all the lines, and I'm almost sure this is due to some issues using gradient descent. Could anyone help me, please? Thank you in advance.
import numpy as np
import sklearn
from sklearn.linear_model import LogisticRegressionCV
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
np.random.seed(0)
x, y = sklearn.datasets.make_moons(200, noise=0.20)
plt.scatter(x[:,0], x[:,1], s=40, c=y, cmap=plt.cm.Spectral)
y = y.reshape(-1,1)
N = x.shape[0]
n_input = min(x.shape)
n_output = 2
n_hidden = max(n_input,n_output) + 20 # 20 is arbitrary
n_it = 10000
alpha = 0.01
def predict(model,xn):
W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'],model['W3'], model['b3']
z1 = W1.dot(xn) + b1
a1 = np.tanh(z1)
z2 = a1.dot(W2) + b2
a2 = np.tanh(z2)
z3 = a2.dot(W3) + b3
a3 = np.tanh(z3)
return a3
model = {}
W1 = np.random.randn(n_input,n_input)
b1 = np.random.randn(1,n_input)
W2 = np.random.randn(n_input,n_hidden)
b2 = np.random.randn(1,n_hidden)
W3 = np.random.randn(n_hidden,n_output)
b3 = np.random.randn(1,n_output)
for i in range(n_it):
# Feedforward:
z1 = x.dot(W1) + b1
a1 = np.tanh(z1)
z2 = a1.dot(W2) + b2
a2 = np.tanh(z2)
z3 = a2.dot(W3) + b3
a3 = np.tanh(z3)
# Loss function:
# f(w,b) = (y - (w*x + b)^2)
# df/dw = -2*(1/N)*x*(y - (w*x + b))
# df/db = -2*(1/N)*(y - (w*x + b))
# Backpropagation:
dW3 = -2*(1/N)*(a2.T).dot(y-a3)
db3 = -2*(1/N)*sum(y-a3)
db3 = db3.reshape(-1,1)
db3 = db3.T
dW2 = -2*(1/N)*a1.T.dot(a2)
db2 = -2*(1/N)*sum(a2)
db2 = db2.reshape(-1,1)
db2 = db2.T
dW1 = -2*(1/N)*(x.T).dot(a1)
db1 = -2*(1/N)*sum(dW1)
db1 = db1.reshape(-1,1)
db1 = db1.T
# Updating weights
W3 += alpha*dW3
b3 += alpha*db3
W2 += alpha*dW2
b2 += alpha*db2
W1 += alpha*dW1
b1 += alpha*db1
model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2, 'W3':W3, 'b3':b3}
test = np.array([2,0])
prediction = predict(model,test)