Genetic Algorithm / AI; basically, am I on the right track?

Question

I know Python isn't the best idea to be writing any kind of software of this nature. My reasoning is to use this type of algorithm for a Raspberry Pi 3 in it's decision making (still unsure how that will go), and the libraries and APIs that I'll be using (Adafruit motor HATs, Google services, OpenCV, various sensors, etc) all play nicely for importing in Python, not to mention I'm just more comfortable in this environment for the rPi specifically. Already I've cursed it as object oriented such as Java or C++ just makes more sense to me, but Id rather deal with its inefficiencies and focus on the bigger picture of integration for the rPi.

I won't explain the code here, as it's pretty well documented in the comment sections throughout the script. My questions is as stated above; can this be considered basically a genetic algorithm? If not, what must it have to be a basic AI or genetic code? Am I on the right track for this type of problem solving? I know usually there are weighted variables and functions to promote "survival of the fittest", but that can be popped in as needed, I think.

I've read up quite a bit of forums and articles about this topic. I didn't want to copy someone else's code that I barely understand and start using it as a base for a larger project of mine; I want to know exactly how it works so I'm not confused as to why something isn't working out along the way. So, I just tried to comprehend the basic idea of how it works, and write how I interpreted it. Please remember I'd like to stay in Python for this. I know rPi's have multiple environments for C++, Java, etc, but as stated before, most hardware components I'm using have only Python APIs for implementation. if I'm wrong, explain at the algorithmic level, not just with a block of code (again, I really want to understand the process). Also, please don't nitpick code conventions unless it's pertinent to my problem, everyone has a style and this is just a sketch up for now. Here it is, and thanks for reading!

# Created by X3r0, 7/3/2016
# Basic genetic algorithm utilizing a two dimensional array system.
# the 'DNA' is the larger array, and the 'gene' is a smaller array as an element
# of the DNA. There exists no weighted algorithms, or statistical tracking to
# make the program more efficient yet; it is straightforwardly random and solves
# its problem randomly. At this stage, only the base element is iterated over.

# Basic Idea:
# 1) User inputs constraints onto array
# 2) Gene population is created at random given user constraints
# 3) DNA is created with randomized genes ( will never randomize after )
#   a) Target DNA is created with loop control variable as data (basically just for some target structure)
# 4) CheckDNA() starts with base gene from DNA, and will recurse until gene matches the target gene
#   a) Randomly select two genes from DNA
#   b) Create a candidate gene by splicing both parent genes together
#   c) Check candidate gene against the target gene
#   d) If there exists a match in gene elements, a child gene is created and inserted into DNA
#   e) If the child gene in DNA is not equal to target gene, recurse until it is

import random

DNAsize = 32
geneSize = 5
geneDiversity = 9
geneSplit = 4
numRecursions = 0

DNA = []
targetDNA = []

def init():
    global DNAsize, geneSize, geneDiversity, geneSplit, DNA

    print("This is a very basic form of genetic software. Input variable constraints below. "
          "Good starting points are: DNA strand size (array size): 32, gene size (sub array size: 5, gene diversity (randomized 0 - x): 5"
          "gene split (where to split gene array for splicing): 2")

    DNAsize = int(input('Enter DNA strand size: '))
    geneSize = int(input('Enter gene size: '))
    geneDiversity = int(input('Enter gene diversity: '))
    geneSplit = int(input('Enter gene split: '))

    # initializes the gene population, and kicks off
    # checkDNA recursion
    initPop()
    checkDNA(DNA[0])

def initPop():

    # builds an array of smaller arrays
    # given DNAsize
    for x in range(DNAsize):
        buildDNA()
        # builds the goal array with a recurring
        # numerical pattern, in this case just the loop
        # control variable
        buildTargetDNA(x)

def buildDNA():
    newGene = []

    # builds a smaller array (gene) using a given geneSize
    # and randomized with vaules 0 - [given geneDiversity]
    for x in range(geneSize):
        newGene.append(random.randint(0,geneDiversity))
    # append the built array to the larger array
    DNA.append(newGene)

def buildTargetDNA(x):

    # builds the target array, iterating with x as a loop
    # control from the call in init()
    newGene = []
    for y in range(geneSize):
            newGene.append(x)
    targetDNA.append(newGene)

def checkDNA(childGene):

    global numRecursions
    numRecursions = numRecursions+1

    gene = DNA[0]
    targetGene = targetDNA[0]
    parentGeneA = DNA[random.randint(0,DNAsize-1)]          # randomly selects an array (gene) from larger array (DNA)
    parentGeneB = DNA[random.randint(0,DNAsize-1)]
    pos = random.randint(geneSplit-1,geneSplit+1)           # randomly selects a position to split gene for splicing
    candidateGene = parentGeneA[:pos] + parentGeneB[pos:]   # spliced gene given split from parentA and parentB

    print("DNA Splice Position: " + str(pos))
    print("Element A: " + str(parentGeneA))
    print("Element B: " + str(parentGeneB))
    print("Candidate Element: " + str(candidateGene))
    print("Target DNA: " + str(targetDNA))
    print("Old DNA:    " + str(DNA))

    # iterates over the candidate gene and compares each element to the target gene
    # if the candidate gene element hits a target gene element, the resulting child
    # gene is created
    for x in range(geneSize):
        #if candidateGene[x] != targetGene[x]:
            #print("false ")
        if candidateGene[x] == targetGene[x]:
            #print("true ")
            childGene.pop(x)
            childGene.insert(x, candidateGene[x])

    # if the child gene isn't quite equal to the target, and recursion hasn't reached
    # a max (apparently 900), the child gene is inserted into the DNA. Recursion occurs
    # until the child gene equals the target gene, or max recursuion depth is exceeded
    if childGene != targetGene and numRecursions < 900:
        DNA.pop(0)
        DNA.insert(0, childGene)
        print("New DNA:   " + str(DNA))
        print(numRecursions)
        checkDNA(childGene)

init()
print("Final DNA:  " + str(DNA))
print("Number of generations (recursions): " + str(numRecursions))

StackOverflow isn't a good fit for open-ended questions -- see the "practical, answerable" sections to http://stackoverflow.com/help/on-topic. The goal is to build a reusable knowledge base, with questions and answers that are useful to more than just the person who first asked them; that's part of what distinguishes between "Q&A site" (which this is) and "discussion forum" (which it's not). — Charles Duffy, Jul 04 '16 at 19:11
Also, forgot to mention, it only figures out the first element in the primary array (DNA) for now, which is all 0s ( [0, 0, 0, 0, 0]). Later on it will iterate over all DNA elements, but haven't gotten to that yet :) — X3r0, Jul 04 '16 at 19:12

score 5 · Accepted Answer · answered Jul 06 '16 at 21:15

I'm working with evolutionary computation right now so I hope my answer will be helpful for you, personally, I work with java, mostly because is one of my main languages, and for the portability, because I tested in linux, windows and mac. In my case I work with permutation encoding, but if you are still learning how GA works, I strongly recommend binary encoding. This is what I called my InitialPopulation. I try to describe my program's workflow:

1-. Set my main variables

This are PopulationSize, IndividualSize, MutationRate, CrossoverRate. Also you need to create an objective function and decide the crossover method you use. For this example lets say that my PopulationSize is equals to 50, the IndividualSize is 4, MutationRate is 0.04%, CrossoverRate is 90% and the crossover method will be roulette wheel. My objective function only what to check if my Individuals are capable to represent the number 15 in binary, so the best individual must be 1111.

2-. Initialize my Population

For this I create 50 individuals (50 is given by my PopulationSize) with random genes.

3-. Loop starts

For each Individuals in Population you need to:

Evaluate fitness according to the objective function. If an Individual is represented by the next characters: 00100 this mean that his fitness is 1. As you can see this is a simple fitness function. You can create your own while you are learning, like fitness = 1/numberOfOnes. Also you need to assign the sum of all the fitness to a variable called populationFitness, this will be useful in the next step.
Select the best individuals. For this task there's a lot of methods you can use, but we will use the roulette wheel method as we say before. In this method, You assign a value to every individual inside your population. This value is given by the next formula: (fitness/populationFitness) * 100. So, if your population fitness is 10, and a certain individual fitness is 3, this mean that this individual has a 30% chance to be selected to make a crossover with another individual. Also, if another individual have a 4 in his fitness, his value will be 40%.
Apply crossover. Once you have the "best" individuals of your population, you need to create a new population. This new population is formed by others individuals of the previous population. For each individual you create a random number from 0 to 1. If this numbers is in the range of 0.9 (since our crossoverRate = 90%), this individual can reproduce, so you select another individual. Each new individual has this 2 parents who inherit his genes. For example: Lets say that parentA = 1001 and parentB = 0111. We need to create a new individual with this genes. There's a lot of methods to do this, uniform crossover, single point crossover, two point crossover, etc. We will use the single point crossover. In this method we choose a random point between the first gene and the last gene. Then, we create a new individual according to the first genes of parentA and the last genes of parentB. In a visual form:

parentA = 1001 parentB = 0111 crossoverPoint = 2 newIndividual = 1011

As you can see, the new individual share his parents genes.

Once you have a new population with new individuals, you apply the mutation. In this case, for each individual in the new population generate a random number between 0 and 1. If this number is in the range of 0.04 (since our mutationRate = 0.04), you apply the mutation in a random gene. In binary encoding the mutation is just change the 1's for 0's or viceversa. In a visual form:

individual = 1011 randomPoint = 3 mutatedIndividual = 1010

Get the best individual
If this individual has reached the solution stop. Else, repeat the loop
End

As you can see, my english is not very good, but I hope you understand the basic idea of a genetic algorithm. If you are truly interested in learning this, you can check the following links:

http://www.obitko.com/tutorials/genetic-algorithms/ This link explains in a clearer way the basics of a genetic algorithm

http://natureofcode.com/book/chapter-9-the-evolution-of-code/ This book also explain what a GA is, but also provide some code in Processing, basically java. But I think you can understand.

Also I would recommend the following books:

An Introduction to Genetic Algorithms - Melanie Mitchell

Evolutionary algorithms in theory and practice - Thomas Bäck

Introduction to genetic algorithms - S. N. Sivanandam

If you have no money, you can easily find all this books in PDF. Also, you can always search for articles in scholar.google.com Almost all are free to download.

Very helpful breakdown of this algorithm for me :). I'll be pouring through this and fixing any misunderstandings in my own code. I know mine is terribly inefficient as it's all for the most part random, but shouldn't be too hard in refactoring these ideas in. Again, thank you; this is the most comprehensive explanation I've come across on this topic, it helps a lot. — X3r0, Jul 07 '16 at 06:27
I'm glad you find helpful my response. Also try to check the books and links. Especially this: http://natureofcode.com/book/chapter-9-the-evolution-of-code/ It helped me a lot :) — Alberto Alegria, Jul 07 '16 at 17:05

score 1 · Answer 2 · answered Jul 17 '16 at 17:54

Just to add a bit to Alberto's great answer, you need to watch out for two issues as your solution evolves.

The first one is Over-fitting. This basically means that your solution is complex enough to "learn" all samples, but it is not applicable outside the training set. To avoid this, your need to make sure that the "amount" of information in your training set is a lot larger than the amount of information that can fit in your solution.

The second problem is Plateaus. There are cases where you would arrive at certain mediocre solutions that are nonetheless, good enough to "outcompete" any emerging solution, so your progress stalls (one way to see this is, if you see your fitness "stuck" at a certain, less than optimal number). One method for dealing with this is Extinctions: You could track the rate of improvement of your optimal solution, and if the improvement has been 0 for the last N generations, you just Nuke your population. (That is, delete your population and the list of optimal individuals and start over). Randomness will make it so that eventually the Solutions will surpass the Plateau.

Another thing to keep in mind, is that the default Random class is really bad at Randomness. I have had solutions improve dramatically by simply using something like the Mesernne Twister Random generator or a Hardware Entropy Generator.

I hope this helps. Good luck.

Genetic Algorithm / AI; basically, am I on the right track?

2 Answers2

1-. Set my main variables

2-. Initialize my Population

3-. Loop starts