2

I'm working on a feed forward artificial neural network (ffann) that will take input in form of a simple calculation and return the result (acting as a pocket calculator). The outcome wont be exact.
The artificial network is trained using genetic algorithm on the weights.

Currently my program gets stuck at a local maximum at:

  • 5-6% correct answers, with 1% error margin
  • 30 % correct answers, with 10% error margin
  • 40 % correct answers, with 20% error margin
  • 45 % correct answers, with 30% error margin
  • 60 % correct answers, with 40% error margin

I currently use two different genetic algorithms:
The first is a basic selection, picking two random from my population, naming the one with best fitness the winner, and the other the loser. The loser receives one of the weights from the winner.

The second is mutation, where the loser from the selection receives a slight modification based on the amount of resulting errors. (the fitness is decided by correct answers and incorrect answers). So if the network outputs a lot of errors, it will receive a big modification, where as if it has many correct answers, we are close to a acceptable goal and the modification will be smaller.

So to the question: What are ways I can prevent my ffann from getting stuck at local maxima?
Should I modify my current genetic algorithm to something more advanced with more variables?
Should I create additional mutation or crossover?
Or Should I maybe try and modify my mutation variables to something bigger/smaller?

This is a big topic so if I missed any information that could be needed, please leave a comment

Edit: Tweaking the numbers of the mutation to a more suited value has gotten be a better answer rate but far from approved:

  • 10% correct answers, with 1% error margin
  • 33 % correct answers, with 10% error margin
  • 43 % correct answers, with 20% error margin
  • 65 % correct answers, with 30% error margin
  • 73 % correct answers, with 40% error margin

The network is currently a very simple 3 layered structure with 3 inputs, 2 neurons in the only hidden layer, and a single neuron in the output layer.
The activation function used is Tanh, placing values in between -1 and 1.
The selection type crossover is very simple working like the following:

[a1, b1, c1, d1] // Selected as winner due to most correct answers
[a2, b2, c2, d2] // Loser

The loser will end up receiving one of the values from the winner, moving the value straight down since I believe the position in the array (of weights) matters to how it performs.

The mutation is very simple, adding a very small value (currently somewhere between about 0.01 and 0.001) to a random weight in the losers array of weights, with a 50/50 chance of being a negative value.

Here are a few examples of training data:

1, 8, -7 // the -7 represents + (1+8)
3, 7, -3 // -3 represents - (3-7)
7, 7, 3  // 3 represents * (7*7)
3, 8, 7  // 7 represents / (3/8)
manlio
  • 18,345
  • 14
  • 76
  • 126
Pontus Magnusson
  • 632
  • 1
  • 10
  • 25
  • A neural network is not a genetic algorithm. Which are you using? – phs Jan 18 '14 at 22:54
  • Oh, I guess I was really unclear on that point. Im using a neural network, which evolves its weights using genetic algorithm, starting from random weights every time. I'll update my question. – Pontus Magnusson Jan 18 '14 at 22:56
  • @PontusMagnusson You could add something about the calculations the networks have to solve (example training data) and something about the network topology you use (Activator function, hidden layers, inputs/outputs). Also do you mutate the network structure, too? –  Jan 18 '14 at 23:08
  • I added additional information about the network and it's structure. For now I only need to mutate the weights so that they fit the procedures. My concern is that it wont be enough to adapt just the weights for such a broad problem as a pocket calculator and I expect that I will have to either take a slightly different approach to the problem, or satisfy with the results of a very high accepted error rate. – Pontus Magnusson Jan 18 '14 at 23:32
  • The point of research is to try stuff and see what works. The only person who can answer your questions is you. Your ideas are reasonable. Try them and see if they work. – Eric Lippert Jan 18 '14 at 23:38
  • Well, you are really correct Eric, and that is what I'm doing asynchronusly from waiting on answers. But I was hoping maybe someone had some interesting thoughts to share to make the process of reasearch a little easier. Do you think I should post my results as an answer or in the actual question as more information about my progress? – Pontus Magnusson Jan 18 '14 at 23:47
  • @PontusMagnusson Is the activation function also applied to the output node? If so how is the output upscaled? Have you made an analytical estimate whether two hidden nodes are sufficient to model stuff like multiplication properly? Because I feel like it is not. (Addition and subtraction shouldn't be that much of a problem though) Also I don't think the operation input can be modeled with such a simple network. Why would you want a neural network for such work anyways? –  Jan 19 '14 at 00:31
  • The activation function is applied to the output node to. The output isnt upscaled but instead compared with the normalized data so that the final equation would look something like `1+1=2 becomes 0.01 + 0.01 = 0.02`. From my understanding, this has a major impact on the results regarding multiplication and division, which may cause the the big error rates in my program. I'm not familiar with analytical esimates. The reason for this neural network is a course Im taking right now with what seems to be badly designed assignments, so I try to take help from SO to improve further. – Pontus Magnusson Jan 19 '14 at 01:03
  • Do you have any tips on where I can learn to upscale my results and do correct comparisons? Because now when I think about it, it is really significant if the numbers are floating numbers, or if they are real numbers. – Pontus Magnusson Jan 19 '14 at 01:05
  • Just a suggestion, the computer science stack exchange site might be another place to ask for suggestions. – iandotkelly Jan 19 '14 at 01:28
  • Thanks iandotkelly, I'll have that in mind the next time :) So many exchanges, you can't know them all. – Pontus Magnusson Jan 19 '14 at 01:36

3 Answers3

2

Use a niching techniche in the GA. A useful alternative is niching. The score of every solution (some form of quadratic error, I think) is changed in taking account similarity of the entire population. This maintains diversity inside the population and avoid premature convergence an traps into local optimum.

Take a look here:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.7342

0

A common problem when using GAs to train ANNs is that the population becomes highly correlated as training progresses. You could try increasing mutation chance and/or effect as the error-change decreases. In English. The population becomes genetically similar due to crossover and fitness selection as a local minim is approached. You can introduce variation by increasing the chance of mutation.

Niccolo
  • 817
  • 1
  • 10
  • 17
0

You can do a simple modification to the selection scheme: the population can be viewed as having a 1-dimensional spatial structure - a circle (consider the first and last locations to be adjacent).

The production of an individual for location i is permitted to involve only parents from i's local neighborhood, where the neighborhood is defined as all individuals within distance R of i. Aside from this restriction no changes are made to the genetic system.

It's only one or a few lines of code and it can help to avoid premature convergence.

References: TRIVIAL GEOGRAPHY IN GENETIC PROGRAMMING (2005) - Lee Spector, Jon Klein

manlio
  • 18,345
  • 14
  • 76
  • 126