1

So I am hitting a wall with my C# Machine Learning project. I am attempting to train an algorithm to recognize numbers. Since this is only an exercise I have a image set of 200 numbers (20 each for 0 to 9). Obviously if I wanted a properly trained algorithm I would use a more robust training set, but this is just an exercise to see if I can get it working in the first place. I can get it up to 60% accuracy, but not past that. I have been doing some research into activation functions and I from what I understand, LeakyRelu is the function I should be using. However, if I use the LeakyRelu function across the board then it doesn't learn anything, and I'm not sure how to use the LeakyRelu as an output activation function. Using sigmoid or tanh as an output activation function makes more sense to me. Here is a block of code that creates the array that feeds the backpropagation:

public static float ACTIVE_VALUE = 1;
public static float INACTIVE_VALUE = -1;

// This is specifically designed for a algorithm that will detect a number between 0 - 9
public static float[] valueToArray(int value)
{

    switch (value)
    {
        case 0:
            return new float[] { ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 1:
            return new float[] { INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 2:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 3:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 4:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 5:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, 
                                ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 6:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 7:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };
        case 8:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE, INACTIVE_VALUE };
        case 9:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, ACTIVE_VALUE };
        default:
            return new float[] { INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE,
                                INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE, INACTIVE_VALUE };


    }
}

I don't know how to use something like this to read a LeakyRelu output. So I figured the best option would be to use LeakyRelu for the input and hidden layers and then use tanh or sigmoid for the output layer. However that creates an issue, because sigmoid just returns NAN (due to a rounding error from what I understand) and tanh returns -1 or 1 but nothing in between. If I use tanh across the board it works, and it learns, but it only reaches an accuracy of 60% then stops developing there. I assume this is due to the "vanishing gradient" issue. However, If I use LeakyRelu for inpunt and hidden layers and then tanh for the output, it stays at 12-14% (which is just as good as randomly guessing a number).

I am using a neural network that I got from a github user here: https://github.com/kipgparker/BackPropNetwork

He posted a research paper online about neural networks, and it was one of the top hits on google. That's how I found it in the first place. I posted my full project in a zip on GitHub here: https://github.com/JoshuaC0352/Machine-Learning

I am not opposed to using a library I can get from nuget like SiaNet (https://scisharp.github.io/SiaNet/api/SiaNet.Layers.AvgPooling1D.html), however I have gotten so familiar with the one I am currently working with I am somewhat reluctant to switch over because I'd feel like I am almost starting from scratch, because I would have to learn how to interface with a whole new library.

EDIT: additional code. This is my while loop that reads the image and trains the algorithm:

    public static void singleThread()
{

    int batchSize = 10000;
    int rangeLow = 0;
    int rangeHi = 9;

    int hits = 0;


    while (true)
    {

        // alternates between training and testing
        //Console.WriteLine("Training...  ");



        for (int i = 0; i < batchSize; i++)
        {

            // Give a training progress report every 100 iterations, this should increase performance
            if (i % 100 == 0)
            {
                Console.SetCursorPosition(0, Console.CursorTop);
                Console.Write("Training: ");
                Console.Write("(" + (((float)i / (float)batchSize) * 100) + "%)");
                Console.Write("                    ");
            }


            // randomly select an image from the list
            int number = rng.Next(rangeLow, rangeHi);
            int index = rng.Next(1, 20);

            Bitmap loadedImage = (Bitmap)Image.FromFile("Train/" + number + "/" +
                                 index + ".png", true);


            int indexLocation = 0;
            // Convert the image into a grayScale value
            for (int x = 0; x < loadedImage.Width; x++)
            {
                for (int y = 0; y < loadedImage.Height; y++)
                {
                    Color pixel = loadedImage.GetPixel(x, y);
                    int grayValue = (int)((pixel.R * 0.3) + (pixel.G * 0.59) + (pixel.B * 0.11));
                    //Console.WriteLine(grayValue);
                    networkInputs[indexLocation] = grayValue;
                    indexLocation++;
                }
            }

            // The network will guess what the image is, and return the guess as a float array

            float[] guess = currentNetwork.BackPropagate(networkInputs, Interface.valueToArray(number));

            // This if statement checks if the guess was correct
            if (Interface.guessToValue(guess) == number)
            {
                hits++;
            }

        }

        currentNetwork.Performance = ((float) hits / (float) batchSize);
        hits = 0;

        Console.WriteLine("Score: " + (currentNetwork.Performance * 100) + "%");
    }
}
JMC0352
  • 25
  • 7
  • 1
    how did you preprocess the images? I mean how did you feed it to the network? tell us more about your data. – moe asal Jun 01 '20 at 19:20
  • OK, I'll add that block of code in an edit. – JMC0352 Jun 01 '20 at 19:24
  • I added the code. I basically read in the whole image and convert it to a array of greyscale values. Then feed that array in as inputs. – JMC0352 Jun 01 '20 at 19:28
  • 1
    the grayscale values are between 0-255, right? – moe asal Jun 01 '20 at 19:29
  • 2
    try converting the grayscale values from 0-255 interval to 0-1 interval. Just divide each pixel with 255. And look carefully about how the neural network weights are initialised if you intend to use tanh or sigmoid. since this is a classification problem, I recommend you use a softmax activation function in your output layer. – moe asal Jun 01 '20 at 19:34
  • I have heard about softmax before, but it wasn't part of this library. Is softmax a different activation function similar to sigmoid, with just a slightly different formula? – JMC0352 Jun 01 '20 at 19:38
  • 1
    the fact that LeakyRELU performed better than sigmoid or tanh is because the values are too large. large in a sense that they get mistreated by tanh and sigmoid and get rounded by the computer to integers. – moe asal Jun 01 '20 at 19:39
  • 1
    softmax activation function will give a percentage at each node (1-0). in which all percentages sum up to 1. It will let you get more sense than sigmoid – moe asal Jun 01 '20 at 19:41
  • I'll look into softmax and see if I can find a formula online. If so, I can add it to this library. – JMC0352 Jun 01 '20 at 19:41
  • 1
    tell me about the results once you preprocces the images :) – moe asal Jun 01 '20 at 19:43
  • Will do. I'm working on that right now. I'll also look into adding softmax into the library. – JMC0352 Jun 01 '20 at 19:47
  • OK, so simply by changing the inputs to values between 0 and 1 instead of 0 and 255, I have already seen major improvements. Using tanh across the board got me to 88%, which is much better. However it leveled off there. So I reduced the learning rate, and ran it again. It's learning much slower now, but I'll let you know what the results are. I researched softmax, and figured out how to code it, but from what I can tell it simply tells you which output is most probable from an array of outputs. However, it doesn't seem like something that is used for back propagation. – JMC0352 Jun 01 '20 at 21:32
  • 1
    the reason why you're getting 88% only is because neural network (alone) is not suitable for image recognition. convolutional neural networks are used for that matter. In order to understand the problem intuitively, you can picture raw neural networks as making sense of all the pixels together, where as conv. nets. make sense of relatively close pixels. – moe asal Jun 02 '20 at 08:57

1 Answers1

1

added the answer for future visitors

  • Try converting the grayscale values from 0-255 interval to 0-1 interval. Just divide each pixel with 255. the fact that LeakyRELU performed better than sigmoid or tanh is because the values are too large. large in a sense that they get mistreated by tanh and sigmoid and get rounded by the computer to integers.

  • Look carefully about how the neural network weights are initialised if you intend to use tanh or sigmoid.

  • Since this is a classification problem, I recommend you use a softmax activation function in your output layer.

after preprocessing the data, @JMC0352 got 88% accuracy only.

the reason why you're getting 88% only is because neural network (alone) is not suitable for image recognition. convolutional neural networks are used for that matter. In order to understand the problem intuitively, you can picture raw neural networks as making sense of all the pixels together, where as conv. nets. make sense of relatively close pixels.

moe asal
  • 750
  • 8
  • 24