-1

Hi I'm trying to train my own designed neural network on the MNIST handwritten data set and every time I run this code the accuracy starts to increase then decreases and I get an overflow warning. Can someone explain whether my code is just poor and messy or whether I have just missed something little out. Thanks in advance

import numpy as np
import pandas as pd
df = pd.read_csv('../input/digit-recognizer/train.csv')
data = np.array(df.values)
data = data.T
data
Y = data[0,:]
X = data[1:,:]
Y_train = Y[:41000]
X_train = X[:,:41000]
X_train = X_train/255
Y_val = Y[41000:]
X_val = X[:,41000:]
X_val = X_val/255
print(np.max(X_train))
class NeuralNetwork:
    def __init__(self, n_in, n_out):
        self.w1, self.b1 = self.Generate_Weights_Biases(10,784)
        self.w2, self.b2 = self.Generate_Weights_Biases(10,10)
    def Generate_Weights_Biases(self, n_in, n_out):
        weights = 0.01*np.random.randn(n_in, n_out)
        biases = np.zeros((n_in,1))
        return weights, biases
    def forward(self, X):
        self.Z1 = self.w1.dot(X) + self.b1
        self.a1 = self.ReLu(self.Z1)
        self.z2 = self.w2.dot(self.a1) + self.b1
        y_pred = self.Softmax(self.z2)
        return y_pred
    def ReLu(self, Z):
        return np.maximum(Z,0)
    def Softmax(self, Z):
        #exponentials = np.exp(Z)
        #sumexp = np.sum(np.exp(Z), axis=0) 
        #print(Z)
        return np.exp(Z)/np.sum(np.exp(Z))
        
    def ReLu_Derv(self, x):
        return np.greaterthan(x, 0).astype(int)
    def One_hot_encoding(self, Y):
        one_hot = np.zeros((Y.size, 10))
        rows = np.arange(Y.size)
        one_hot[rows, Y] = 1
        one_hot = one_hot.T
        return one_hot
    def Get_predictions(self, y_pred):
        return np.argmax(y_pred, 0)
    def accuracy(self, pred, Y):
        return np.sum(pred == Y)/Y.size
    def BackPropagation(self, X, Y, y_pred, lr=0.01):
        m = Y.size
        one_hot_y = self.One_hot_encoding(Y)
        e2 = y_pred - one_hot_y
        derW2 = (1/m)* e2.dot(self.a1.T)
        derB2 =(1/m) * np.sum(e2,axis=1)
        derB2 = derB2.reshape(10,1)
        e1 = self.w2.T.dot(e2) * self.ReLu(self.a1)
        derW1 = (1/m) * e1.dot(X.T)
        derB1 = (1/m) * np.sum(e1, axis=1)
        derB1 = derB1.reshape(10,1)
        self.w1 = self.w1 - lr*derW1
        self.b1 = self.b1 - lr*derB1
        self.w2 = self.w2 - lr*derW2
        self.b2 = self.b2 - lr*derB2
    def train(self, X, Y, epochs = 1000):
        for i in range(epochs):
            y_pred = self.forward(X)
            predict = self.Get_predictions(y_pred)
            accuracy = self.accuracy(predict, Y)
            print(accuracy)
            self.BackPropagation(X, Y, y_pred)
        return self.w1, self.b1, self.w2, self.b2
    
NN = NeuralNetwork(X_train, Y_train)
w1,b1,w2,b2 = NN.train(X_train,Y_train)

1 Answers1

0

You should use a different bias for the second layer

self.z2 = self.w2.dot(self.a1) + self.b1 # not b1
self.z2 = self.w2.dot(self.a1) + self.b2 # but b2

When doing something like this

derB2 =(1/m) * np.sum(e2,axis=1)

you would like to use (keepdims = True) to make sure that derB2.shape is (something,1) but not (something, ). It makes your code more rigorous.

Minghao
  • 23
  • 1
  • 7
  • Thanks for the comment, but aside from that do you know why the network wont converge and instead results in an overflow in the SoftMax – that annoying kid Jul 11 '22 at 19:11
  • Did you still have such problems after fixing the issue? Your implementation is wrong, it's not a NN. – Minghao Jul 12 '22 at 03:21
  • Yes I tried to modify a binary classification network which was using minibatch gradient descent and worked, but my multiclass network encounters a rapid increase in the Z values of output neurons and I wonder whether I might be using the wrong columns in my SoftMax calculations as I'm passing 42000 examples into the network at the same time. Also what do you mean by it is not a NN – that annoying kid Jul 12 '22 at 12:07