-1

I followed an article here: TowardsDataScience.

I wrote math equations about the network, everything made sense.

However, after writing the code, results are pretty strange, like it is predicting always same class...

I spent a lot of time on it, changed many things, but I still cannot understand what I did wrong.

Here is the code:

# coding: utf-8

from mnist import MNIST
import numpy as np
import math
import os
import pdb


DATASETS_PREFIX    = '../Datasets/MNIST'
mndata             = MNIST(DATASETS_PREFIX)
TRAINING_IMAGES, TRAINING_LABELS  = mndata.load_training()
TESTING_IMAGES , TESTING_LABELS   = mndata.load_testing()

### UTILS

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def d_sigmoid(x):
    return x.T * (1 - x)
    #return np.dot(x.T, 1.0 - x)

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def d_softmax(x):
    #This function has not yet been tested.
    return x.T * (1 - x)

def tanh(x):
    return np.tanh(x)

def d_tanh(x):
    return 1 - x.T * x

def normalize(image):
    return image / (255.0 * 0.99 + 0.01)

### !UTILS

class NeuralNetwork(object):
    """
    This is a 3-layer neural network (1 hidden layer).
    @_input   : input layer
    @_weights1: weights between input layer and hidden layer  (matrix shape (input.shape[1], 4))
    @_weights2: weights between hidden layer and output layer (matrix shape (4, 1))
    @_y       : output
    @_output  : computed output
    @_alpha   : learning rate
    """
    def __init__(self, xshape, yshape):
        self._neurones_nb = 20
        self._input       = None
        self._weights1    = np.random.randn(xshape, self._neurones_nb)
        self._weights2    = np.random.randn(self._neurones_nb, yshape)
        self._y           = np.mat(np.zeros(yshape))
        self._output      = np.mat(np.zeros(yshape))
        self._alpha1      = 0.1
        self._alpha2      = 0.1
        self._function    = sigmoid
        self._derivative  = d_sigmoid
        self._epoch       = 1

    def Train(self, xs, ys):
        for j in range(self._epoch):
            for i in range(len(xs)):
                self._input = normalize(np.mat(xs[i]))
                self._y[0, ys[i]] = 1
                self.feedforward()
                self.backpropagation()
                self._y[0, ys[i]] = 0

    def Predict(self, image):
        self._input = normalize(image)
        out = self.feedforward()
        return out

    def feedforward(self):
        self._layer1 = self._function(np.dot(self._input, self._weights1))
        self._output = self._function(np.dot(self._layer1, self._weights2))
        return self._output

    def backpropagation(self):
        d_weights2 = np.dot(
            self._layer1.T,
            2 * (self._y - self._output) * self._derivative(self._output)
        )
        d_weights1 = np.dot(
            self._input.T,
            np.dot(
                2 * (self._y - self._output) * self._derivative(self._output),
                self._weights2.T
            ) * self._derivative(self._layer1)
        )
        self._weights1 += self._alpha1 * d_weights1
        self._weights2 += self._alpha2 * d_weights2

if __name__ == '__main__':
    neural_network = NeuralNetwork(len(TRAINING_IMAGES[0]), 10)
    print('* training neural network')
    neural_network.Train(TRAINING_IMAGES, TRAINING_LABELS)
    print('* testing neural network')
    count = 0
    for i in range(len(TESTING_IMAGES)):
        image       = np.mat(TESTING_IMAGES[i])
        expected    = TESTING_LABELS[i]
        prediction  = neural_network.Predict(image)
        if i % 100 == 0: print(expected, prediction)
    #print(f'* results: {count} / {len(TESTING_IMAGES)}')

Thank you for your help, really appreciated.

Julien

Maxim
  • 52,561
  • 27
  • 155
  • 209

1 Answers1

0

Well, I don't see any error in the implementation so considering your network, this could be improved by doing two things :

  • One epoch is not enough. Like not a all ! You need to pass over your data multiple times (a great minimum is 10 times, average might be around 100 epochs and this could go up to 5000 or more)

  • You network is a shallow network, e.g. really simple. To detect difficult things (like images), you could implement a CNN (Convolutional Neural Network) or first trying to deepen your network and complexify it

=> Try to add layers (3, 4, 5 etc..) and then add neurons to each layers (50, 60, ..) depending of the size of your input. You can still go up to 800, 900 or more.

LaSul
  • 2,231
  • 1
  • 20
  • 36
  • Thank you for your answer. I am implementing more layers as you advised, I will see what it gives. But I have the feeling that even a simple perceptron should give something interesting, maybe not amazing performances, but still something. I mean, even a Bayesian can classify MNIST correctly, I should not have to go to deep learning technics to do it no. You see what I mean? – Julien Séveno-Piltant Dec 14 '18 at 10:30
  • I see your point. How do you monitor your results ? E.g. what are your metrics ? Like your accuracy score, confusion matrix, etc ? – LaSul Dec 14 '18 at 12:35
  • Accuracy score yes. I also display score for each class. The strange thing is that it gives the same score for every image approximately. – Julien Séveno-Piltant Dec 14 '18 at 12:37
  • You mean like 70% for class 0, 71% for class 1 etc... ? I don't see anything strange. What is your overall accuracy ? – LaSul Dec 14 '18 at 12:39
  • Another strange thing I just noticed is that you should use softmax (for multi-class) instead of sigmoid (for two classes only) – LaSul Dec 14 '18 at 12:56
  • Yes you are right... I will switch functions and try again, I will keep you updated. – Julien Séveno-Piltant Dec 15 '18 at 00:10