-1

I am coding a classifier neural network from scratch. It is not really learning and I believe that somewhere there is a gradient explosion/vanishing issue. Could be some other stuff as well that I cannot imagine right now.

I have coded my own 2000 samples data set that has two features: x1, x2 and a label column that has 0 or 1.

I have tested the architecture on a neural network that I made via keras framework and it yielded an 85% accuracy on the same dataset with same epoch value. Its fine that accuracy was 0.85, thing is it worked.

Please help me figure out what am I doing wrong in my code below. Thank you!

My code:


import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from my_first_nnfs_dataset import data_df
    



df = data_df

df = df.reset_index(drop = True)

len_of_training_data = 1900

max_df = df.max()['data_y']

X_train = np.array(df[:len_of_training_data][['data_x','data_y']]/max_df).T * 10
y_train = np.array(df[:len_of_training_data][['label']]).T

X_test = np.array(df[len_of_training_data:][['data_x','data_y']]/max_df).T  * 10
y_test = np.array(df[len_of_training_data:][['label']]).T



def initialize_parameters():
    
    W1 = np.random.rand(3,2)
    b1 = np.random.rand(3,1)
    
    W2 = np.random.rand(2,3)
    b2 = np.random.rand(2,1)
    
    return W1, b1, W2, b2

def relu(X):
    return np.maximum(0, X)

def relu_prime(X):
    return X > 0

def sigmoid(X):
    return 1/(1 + np.exp(-X))

def forward_propagation(W1, b1, W2, b2, X):
    
    Z1 = W1.dot(X) + b1
    A1 = relu(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = sigmoid(Z2)
    
    return Z1, A1, Z2, A2


def backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X, Y):
    
    a = A2 - Y
    b = a.dot(A1.T)
    dW2 = b
    
    c = W2.T.dot(a)
    d = np.multiply(c, relu_prime(Z1))
    e = d.dot(X.T)
    dW1 = e
    
    db2 = np.sum(a)
    db1 = np.sum(d)
    
    
    return dW1, dW2, db1, db2

def update_parameters(W1, b1, W2, b2, dW1, dW2, db1, db2, alpha):
    
    
    W2 = W2 - alpha * dW2
    W1 = W1 - alpha * dW1
    b2 = b2 - alpha * db2
    b1 = b1 - alpha * db1
    
    return W1, b1, W2, b2


def one_hot_y(Y):
    one_hot_y = np.zeros((2, len_of_training_data))
    for i in range(0, y_train.size):
        
        if y_train[0,i] == 0:
            one_hot_y[0,i] = 1
            
        elif y_train[0,i] == 1:
            one_hot_y[1,i] = 1
    return one_hot_y

one_hot_y_train = one_hot_y(y_train)
abcdef = one_hot_y_train[:,2].reshape(2,1)

a2_predictions = []


def accuracy(a2_predictions):
    a2_p = a2_predictions[-len_of_training_data:]
    latest_epoch = a2_p[-1]
    
    a = 0
    
    for i in range(y_train.size):
        if np.argmax(latest_epoch[:,i], axis = 0) == np.argmax(one_hot_y_train[:,i], axis = 0):
            a += 1
    return a/y_train.size
    
    
    
    
        
def train(X_train, one_hot_y_train, alpha, epoch):
    
    W1, b1, W2, b2 = initialize_parameters()
    for epoch in range(epochs):
        
        for column in range(y_train.size):
            
            each_example = X_train[:,column].reshape(2,1)
            each_one_hot_y = one_hot_y_train[:,column].reshape(2,1)
            
            
            
            Z1, A1, Z2, A2 = forward_propagation(W1, b1, W2, b2, X_train)
            
            dW1, dW2, db1, db2 = backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X_train, each_one_hot_y)
            
            W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, dW2, db1, db2, alpha)
            
            a2_predictions.append(A2)
            
           
            
        if epoch % 10 == 0:
            
            print(f'Epoch: {epoch}')
            print(f'Accuracy:{accuracy(a2_predictions)}\n')
            
    return W1, b1, W2, b2

epochs = 100
alpha = 0.1

W1, b1, W2, b2 = train(X_train, one_hot_y_train, alpha = alpha, epoch = epochs)

Z1, A1, Z2, A2 = forward_propagation(W1, b1, W2, b2, X_test)

test = np.zeros((1, y_test.size))

for i in range(y_test.size):
    if A2[0,i] > A2[1,i]:
        test[0,i] = 0
    else:
        test[0,i] = 1
acc = 0

for i in range(len(test)):
    if test[0][i] == y_test[i][0]:
        acc += 1

print(f'accuracy: {acc/y_test.size}')

Output:

/Users/apple/Desktop/my_first_nnfs.py:44: RuntimeWarning: overflow encountered in exp
  return 1/(1 + np.exp(-X))
Epoch: 0
Accuracy:0.5189473684210526

Epoch: 10
Accuracy:0.5189473684210526

Epoch: 20
Accuracy:0.5189473684210526

Epoch: 30
Accuracy:0.5189473684210526

Epoch: 40
Accuracy:0.5189473684210526

Epoch: 50
Accuracy:0.5189473684210526

Epoch: 60
Accuracy:0.5189473684210526

Epoch: 70
Accuracy:0.5189473684210526

Epoch: 80
Accuracy:0.5189473684210526

Epoch: 90
Accuracy:0.5189473684210526

accuracy: 0.009900990099009901

Necessary variables after coding:

W1 = 0.914082   4.92167
     5.70267e+09    -1.40049e+10
    -0.986493   -8.28296


W2 = -61.9766   1.2412e+12  -85.8557
     8.91069    -1.2412e+12 16.2499


#A1 is all zeros array of shape (3,101)
#A2 is all ones array of shape (2,101)`

edit - I printed the back prop and over the many iterations dW1 stays at [[0,0], [0,0],[0,0]] a (3,2) shape array, dW2 stays at [[0,0,0], [0,0,0]] (2,3) array, db1 is 0 and db2 alternates between -1900 and 1900.

Rob
  • 14,746
  • 28
  • 47
  • 65
Maks
  • 21
  • 4

1 Answers1

1

From the author's removed comment in the question, he was inputting the entire training set X_train, rather than a single example. This was the major issue. Having corrected that, the nn now performs with varying accuracy (increasing over time) and yields an overall accuracy of 0.6-0.75.

Rob
  • 14,746
  • 28
  • 47
  • 65