-1

For a project on a HPC I want to use pygad to produce sets of points that can then be passed to jobs on the cluster: Therefore, I would like all points within a population to be unique (time on the cluster is expensive so i would like to optimally use the jobs i submit and not have duplicates). Also since the jobs simulate physics I need to constrain my values as some values just do not make any sense physically (e.g. negative number of particles or negative length).

So far so good. pygad seems to have two parameters allow_duplicate_genes and gene_space that seem to be doing exactly what I want according to my understanding of the docs.

However, this code:

import pygad
import numpy
import itertools

def mock_function(solution, sol_idx):
    return -1 * (solution[0] ** 2 + solution[1] ** 2)
range = [.1, 1.]
param_values = numpy.linspace(range[0], range[1], 10).tolist()
initial_population = list(itertools.product(param_values, param_values))

ga = pygad.GA(num_generations=100,
              num_parents_mating=10,
              gene_type=float,
              gene_space={'low': range[0], 'high': range[1]},
              fitness_func=mock_function,
              initial_population=initial_population,
              mutation_probability=0.2,
              allow_duplicate_genes=False,
              )
ga.run()

new_points = ga.population
unique_points = set([tuple(x) for x in new_points])
print(len(initial_population), len(new_points), len(unique_points))
print(f"Best solution is {ga.best_solution()[0]}.")
for point in new_points:
    for x in point:
        if not range[0] <= x <= range[1]:
            print(f"Point {point} is outside gene_space.")

Both produces tons of non-unique points and also occasionally points outside the gene_space. What am I missing?

Thanks in advance for the help and let me know if you need any more details.

Edit: In the github repo there is an (open) issue about this see here claiming that the latest release should fix the uniqueness issue. However, in this example it still does not. It also still produces points outside of the gene_space.

dmark04
  • 1
  • 1

1 Answers1

0

This code solves any duplicate genes. The changes are:

  1. Remove duplicates from the passed initial population.
  2. Set mutation_by_replacement=True.

The code also sets save_solutions=True to validate that no generated solution at any generation have duplicates. The minimum and maximum gene values in all explored solutions are printed to make sure that no gene is outside the range [.1, 1.].

import pygad
import numpy
import itertools

def mock_function(ga_instance, solution, sol_idx):
    return -1 * (solution[0] ** 2 + solution[1] ** 2)
range = [.1, 1.]
param_values = numpy.linspace(range[0], range[1], 10).tolist()
initial_population = list(itertools.product(param_values, param_values))

## Remove duplicates from the initial population.
for idx, sol in enumerate(initial_population):
    sol = list(sol)
    if sol[0] == sol[1]:
        if sol[1] < 0.101:
            sol[1] = sol[1] + 0.001
        else:
            sol[1] = sol[1] - 0.001
    initial_population[idx] = sol.copy()

ga = pygad.GA(num_generations=100,
              num_parents_mating=10,
              gene_type=float,
              gene_space={'low': range[0], 'high': range[1]},
              fitness_func=mock_function,
              initial_population=initial_population,
              mutation_probability=0.2,
              allow_duplicate_genes=False,
              save_solutions=True,
              suppress_warnings=True,
              mutation_by_replacement=True
              )
ga.run()

new_points = ga.population
unique_points = set([tuple(x) for x in new_points])
print(len(initial_population), len(new_points), len(unique_points))
print(f"Best solution is {ga.best_solution()[0]}.")
for point in ga.solutions:
    for x in point:
        if not range[0] <= x <= range[1]:
            print(f"Point {point} is outside gene_space.")

print("Maximum gene value found", numpy.max(ga.solutions))
print("Maximum gene value found", numpy.min(ga.solutions))
Ahmed Gad
  • 691
  • 1
  • 7
  • 26