2

I'm using DEAP's implementation of genetic programming for one of my research projects.

I would like to create a GP that works on pandas DataFrames: Each primitive will be a custom function that takes a DataFrame as input and returns a DataFrame as output. Similarly, the terminals can be a DataFrame of all 1's or all 0's.

A simple example of one of the primitives could be (note that this is pseudocode-ish):

def add_5(input_df):
    return input_df + 5

pset.addPrimitive(add_5)

and an example terminal could be:

pset.addTerminal(pd.DataFrame(np.ones(500))

Is this possible with DEAP? What would the code look like? I keep getting e.g. NoneType errors from the terminals.

Randy Olson
  • 3,131
  • 2
  • 26
  • 39

2 Answers2

2

I was able to work this out with help from the DEAP developers. For those who find themselves in my position, below is some working code for a DEAP GP algorithm that optimizes the values in two columns of a DataFrame to 0. The example problem is obviously trivial and useless; it's meant to be a straightforward example of DEAP working on DataFrames.

import operator
import math
import random

import numpy as np
import pandas as pd

from deap import algorithms
from deap import base
from deap import creator
from deap import tools
from deap import gp

def add_5(input_df):
    return input_df + 5.

def subtract_5(input_df):
    return input_df - 5.

def multiply_5(input_df):
    return input_df * 5.

def divide_5(input_df):
    return input_df / 5.


pset = gp.PrimitiveSet('MAIN', 1)
pset.addPrimitive(add_5, 1)
pset.addPrimitive(subtract_5, 1)
pset.addPrimitive(multiply_5, 1)
pset.addPrimitive(divide_5, 1)

creator.create('FitnessMin', base.Fitness, weights=(-1.0,))
creator.create('Individual', gp.PrimitiveTree, fitness=creator.FitnessMin)

toolbox = base.Toolbox()
toolbox.register('expr', gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register('individual', tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register('population', tools.initRepeat, list, toolbox.individual)
toolbox.register('compile', gp.compile, pset=pset)

def evalSymbReg(individual, points):
    # Transform the tree expression in a callable function
    func = toolbox.compile(expr=individual)
    result = func(points)
    return abs(result.column1.sum() + result.column2.sum()),

toolbox.register('evaluate', evalSymbReg, points=pd.DataFrame({'column1': [125] * 500, 'column2': [125] * 500}))
toolbox.register('select', tools.selTournament, tournsize=3)
toolbox.register('mate', gp.cxOnePoint)
toolbox.register('expr_mut', gp.genFull, min_=0, max_=2)
toolbox.register('mutate', gp.mutUniform, expr=toolbox.expr_mut, pset=pset)


if __name__ == '__main__':
    pop = toolbox.population(n=100)
    hof = tools.HallOfFame(1)
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register('avg', np.mean)
    stats.register('min', np.min)
    stats.register('max', np.max)
    pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 20, stats=stats, halloffame=hof)
Randy Olson
  • 3,131
  • 2
  • 26
  • 39
0

Are you using strongly typed gp in deap? Defining the type of the primatives

There is a nice code sample here

http://deap.readthedocs.org/en/latest/tutorials/advanced/gp.html#strongly-typed-gp

  • I had an earlier implementation with strongly typed GP in DEAP, but that didn't seem to help with this situation. – Randy Olson Sep 04 '15 at 22:28
  • Just looking at your primitive example should it not be ` pset.addPrimitive(add_5, 1)` – user656541 Sep 04 '15 at 22:39
  • Yeah, it's really just pseudocode-ish up there. I suppose I'm looking for a working example with pandas DataFrames -- it doesn't have to follow my example. – Randy Olson Sep 04 '15 at 22:44