Optimisation of hill climbing algorithm in c# for training neural networks

Question

I have written a small project in C# that creates and trains neural networks. For more details see my previous question here: (https://scicomp.stackexchange.com/questions/19481).

The neural networks perform well after enough training, but I realise that my self-written hill climbing algorithm may not be perfect and I'm looking for suggestions for improvement. In particular, can I reach the local optimum with less calls to the fitness evaluation function?

There don't seem to be many examples around the web for simple hill climbing algorithms in C#. There is the .NET Math Library, but I would prefer to not have to pay for something.

The hill climbing algorithm runs on every weight and every bias in the network to train the network, and I run multiple passes. I have looked into back propagation, but this seems to be only applied for a single training example, I have ~7000 examples in my training data, and the fitness function evaluates the average performance of the network on all of them and returns a continuous (double) score.

Here is my current code:

    public static double ImproveProperty(ref double property, double startingFitness, int maxIters, Random r, ref Defs.Network network, Func<Defs.Network, double> fitnessFunction)
    {
        //Record starting values
        var lastFitness = startingFitness;
        var lastValue = property;
        //Randomise magnitude of change to reduce chance 
        //of getting stuck in local optimums
        var magnitude = r.NextDouble();
        var positive = true;
        var iterCount = 0f;
        var magnitudeChange = 5;
        while (iterCount < maxIters)
        {
            iterCount++;
            if (positive)
            {   //Try adding a positive value to the property
                property += magnitude;
                //Evaluate the fitness
                var fitness = fitnessFunction(network);
                if (fitness == lastFitness)
                {   //No change in fitness, increase the magnitude and re-try
                    magnitude *= magnitudeChange;
                    property = lastValue;
                }
                else if (fitness < lastFitness)
                {   //This change decreased the fitness (bad)
                    //Put the property back and try going in the negative direction
                    property = lastValue;
                    positive = false;
                }
                else
                {   //This change increased the fitness (good)
                    //on the next iteration we will try 
                    //to apply the same change again
                    lastFitness = fitness;
                    lastValue = property;
                    //don't increase the iteration count as much
                    //if a good change was made
                    iterCount -= 0.9f;
                }
            }
            else
            {   //Try adding a negative value to the property
                property -= magnitude;
                var fitness = fitnessFunction(network);
                if (fitness == lastFitness)
                {
                    //No change in fitness, increase the magnitude and re-try
                    magnitude *= magnitudeChange;
                    property = lastValue;
                }
                else if (fitness < lastFitness)
                {
                    //This change decreased the fitness (bad)
                    //Now we know that going in the positive direction 
                    //and the negative direction decreases the fitness
                    //so make the magnitude smaller as we are probably close to an optimum
                    property = lastValue;
                    magnitude /= magnitudeChange;
                    positive = true;
                }
                else
                {
                    //This change increased the fitness (good)
                    //Continue in same direction
                    lastFitness = fitness;
                    lastValue = property;
                    iterCount -= 0.9f;
                }
            }
            //Check bounds to prevent math functions overflowing
            if (property > 100)
            {
                property = 100;
                lastFitness = fitnessFunction(network);
                return lastFitness;
            }
            else if (property < -100)
            {
                property = -100;
                lastFitness = fitnessFunction(network);
                return lastFitness;
            }
        }
        return lastFitness;
    }

The fitness function is very expensive, so it should be called as little as possible. I'm looking for any improvements in getting to the local optimum with less calls to the fitness function. Getting stuck in a local optimum is not too much of a concern, I have graphed the fitness function against the value of different weights and biases in the network, and it looks like usually there are between 1-3 local optimums in the graph. If the network remains at the same fitness for a few passes then I can add a parameter to this function to attempt restarting the hill climbing from a random value.

score 0 · Answer 1 · answered Apr 29 '15 at 13:19

0

This approach won't really scale; you are trying to evaluate the very expensive total fitness function several times just to get a small improvement in a single parameter. This is the entire reason why gradient-based methods look at the optimization problem sample by sample, or more usually, mini-batch by mini-batch. The fitness function decomposes into a sum of fitness functions over each sample (or batch), and this is what allows you to compute a small update and take a step in the right direction.

You should read up on the theory a bit. There are many good online resources, for instance this online book.

Alternatively, if you aren't so interested in the theory behind neural networks and just want to apply them, I would recommend not to reinvent the wheel and instead just to use one of the many open source toolkits for neural networks, for instance Torch7, caffe, pylearn2, lasagne, keras, theanets ...

answered Apr 29 '15 at 13:19

cfh

4,576
1
24
34

Thanks for the help. I suppose I had never thought of using open source toolkits before, as I found this project fun to write as a hobby, to maybe increase my programming skills and my understanding of machine learning. Although I realise that I'm not perfect and my code will probably have subtle bugs. So I will look into using the toolkits that you mentioned. – F Chopin Apr 29 '15 at 13:32
Although out of interest, why optimise the parameters sample by sample? A change which is a step in the right direction for one training example may be the wrong direction for another example. I assumed that moving it in the direction that is best on average for all samples would be better. Or is the idea that the movement back and forth when optimising sample by sample averages out? – F Chopin Apr 29 '15 at 13:42
@Karl: Because computing the overall gradient may be too expensive for a large dataset, and doing it piecewise allows online learning. Start reading some material, e.g.: http://deeplearning.net/tutorial/gettingstarted.html#opt-sgd – cfh Apr 29 '15 at 14:17

Optimisation of hill climbing algorithm in c# for training neural networks

1 Answers1