0

I am trying to understand how evolution and fitness function work. I implemented my own Knapsack problem, which tries to find 3 most valuable items from a given set. It is my own cost function:

  static int counter=1;
    static double MyCostFunction(boolean[] chromosome, double[] data){
        System.out.println("COST FUNCTION: "+counter++);
        double sum = 0;
        int count = 0;
        for (int i =0; i<chromosome.length; i++){
            if (chromosome[i]){
                sum += data[i];
                count ++;
            }
        }
        return  count<=3 ? sum : 0;
    }

I’m printing out just to see how many times MyCostFunction is performed. Right now I am working with a small set of 8 items. Here it is:

public static void main(String[] args) {
        double[] data = {-15,-7, -1, -1,7,100,200,300};
        Integer[] items = new Integer[data.length];
        int i = 0;
        for (double d : data){
            items[i] = i;
            i++;
        }

        ISeq<Integer> zbiorDlaGA = ISeq.of(items);


        final ISProblem knapsack= new ISProblem(zbiorDlaGA,
                chromosome -> ISProblem.MyCostFunction(  chromosome,  data)
        );

This is the Engine that I’m using:

final Engine<BitGene,Double> engine= Engine.builder(knapsack)
                .executor( Runnable::run)
                .populationSize(5)
                .survivorsSelector(new TournamentSelector<>(3))
                .offspringSelector(new RouletteWheelSelector<>())
                .alterers(
                        new Mutator<>(1.0),
                        new SinglePointCrossover<>(0.0)
                ).build();

And that’s how I’m obtaining statistics:

 final EvolutionStatistics<Double,?> statistics=EvolutionStatistics.ofNumber();

        final  Phenotype<BitGene,Double> best=engine.stream()
               // .limit(bySteadyFitness(15))
                .limit(5)
                .peek(r-> System.out.println("########CURRENT GEN: "
                        +r.generation()+
                        ": "+ r.totalGenerations()+
                        ": "+r.bestPhenotype()+
                        " ALTERED: "+r.alterCount()+
                        " INVALID: "+r.invalidCount()+
                        " GENOTYPE: "+r.genotypes()))
                .peek(statistics)
                .collect(toBestPhenotype());


        
        System.out.println(statistics);
        System.out.println(best);

I've got a few questions concerning some behaviours of your library that I don't really understand: Even when I set mutation probability to 1.0 (it should change every bit, I think so) it gives me this result:

COST FUNCTION: 1
COST FUNCTION: 2
COST FUNCTION: 3
COST FUNCTION: 4
COST FUNCTION: 5
COST FUNCTION: 6
COST FUNCTION: 7
COST FUNCTION: 8
########CURRENT GEN: 1: 1: [01100000] -> 300.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10110100],[00011011],[01011101],[00111001],[01100000]]
COST FUNCTION: 9
COST FUNCTION: 10
COST FUNCTION: 11
########CURRENT GEN: 2: 2: [00011011] -> 0.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[00011011],[01011101],[10100101],[01111010],[00001011]]
COST FUNCTION: 12
COST FUNCTION: 13
COST FUNCTION: 14
########CURRENT GEN: 3: 3: [10100101] -> 0.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10100101],[01011101],[01111011],[01010011],[10010101]]
COST FUNCTION: 15
COST FUNCTION: 16
COST FUNCTION: 17
########CURRENT GEN: 4: 4: [10000010] -> 293.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[01011101],[01010011],[01111011],[10000010],[00001001]]
COST FUNCTION: 18
COST FUNCTION: 19
COST FUNCTION: 20
########CURRENT GEN: 5: 5: [10000010] -> 293.0 ALTERED: 24 INVALID: 0 GENOTYPE: [[10000010],[01011101],[01000100],[10111100],[10011001]]
+---------------------------------------------------------------------------+
|  Time statistics                                                          |
+---------------------------------------------------------------------------+
|             Selection: sum=0,003352000000 s; mean=0,000670400000 s        |
|              Altering: sum=0,010647600000 s; mean=0,002129520000 s        |
|   Fitness calculation: sum=0,011403300000 s; mean=0,002280660000 s        |
|     Overall execution: sum=0,033579600000 s; mean=0,006715920000 s        |
+---------------------------------------------------------------------------+
|  Evolution statistics                                                     |
+---------------------------------------------------------------------------+
|           Generations: 5                                                  |
|               Altered: sum=120; mean=24,000000000                         |
|                Killed: sum=0; mean=0,000000000                            |
|              Invalids: sum=0; mean=0,000000000                            |
+---------------------------------------------------------------------------+
|  Population statistics                                                    |
+---------------------------------------------------------------------------+
|                   Age: max=4; mean=0,560000; var=1,090000                 |
|               Fitness:                                                    |
|                      min  = -23,000000000000                              |
|                      max  = 300,000000000000                              |
|                      mean = 41,840000000000                               |
|                      var  = 10763,306666666665                            |
|                      std  = 103,746357365773                              |
+---------------------------------------------------------------------------+
[01100000] -> 300.0

Why does it calculate fitness for three individuals only? Is it somehow caching the values already calculated for the same chromosome?

And why in the first generation it is calculated 8 times?

The number of altered individuals is always 3*8, what implies that only 3 chromosomes were modified during the evolution. Why? I thought it has something to do with TournamentSelector sampleSize, but it doesn’t change number of changed chromosomes.

With the probability of mutation equal to 100% and crossover prob.=100% it should change every bit in chromosome, so there will only be 2 versions of chromosome in each generation, but it doesn’t work like that. Why? Does it randomly select value of bit or sets the opposite?

Does the number of bits set to true have to be a constant one(or approximately constant) in population/generation?

I am using these strange values for crossover and mutation probabilities, because I was wondering earlier why it didn’t give the number of fitness calculations performed equal to populationSize*NumberOfGenerations. So I started to experiment.

1 Answers1

0

The described behavior is as expected :)

The following facts leads to the observed outcome:

  1. During the evolution process, the population is divided into offspring and survivors.
  2. Only the offspring is target for the alterers (mutation operation).
  3. The default offspring fraction is set to 0.6, which makes an offspring count of 3 for your example.
  4. The fitness values of unchanged individuals is cached.

The outcome:

  1. The reason for the first five fitness evaluations is induced by the fresh, initial population of 5.
  2. The next three evaluations are due to the three offspring individuals, which has been mutated with 100% certainty.
  3. The subsequent evaluation blocks of three evaluations have the same reason.

I think this explains the output, doesn't it?

Control the number of initially set bits

If you want to control the number of initial bits of the BitChromosome, you can use the original Codec to create your own variation.

public static <T> InvertibleCodec<ISeq<T>, BitGene>
ofSubSet(final ISeq<? extends T> basicSet, final double onesProbability) {
    final InvertibleCodec<ISeq<T>, BitGene> codec = Codecs.ofSubSet(basicSet);
    return InvertibleCodec.of(
        Genotype.of(BitChromosome.of(basicSet.length(), onesProbability)),
        codec.decoder(),
        codec.encoder()
    );
}
  • I think it does. Thank you very much for your quick response. – Mateusz Gibas May 11 '22 at 21:13
  • I've recently discovered, that when I set the number of the most valuable items to find to 1 the whole algorithm started to get lost. Now I've got 24 items in Knapsack and when I try to find less than 6 (so less than 1/4 of the given set) it can't get below that number of bits set to true in chromosome. Is there a factor that determines how many bits have to be set to true in chromosome. Is it possible to change? – Mateusz Gibas May 13 '22 at 10:38
  • You can set the one's probability of the bit-chromosome. But than you have to create your won codec and can't use the `Codecs.ofSubSet` method. – Franz Wilhelmstötter May 13 '22 at 19:31
  • I've read Codecs implementation and some examples of custom codecs that can be found on Internet. I'm still not exactly sure how to implement my own Codec suitable for this problem. Could you give me some advice on how to implement Codec that will help me to overcome the obstacle that I mentioned in my previous message? – Mateusz Gibas May 17 '22 at 19:49
  • I added an example how to control the initial number of set bits of the bit-chromosome. – Franz Wilhelmstötter May 19 '22 at 18:57