For a project on a HPC I want to use pygad to produce sets of points that can then be passed to jobs on the cluster: Therefore, I would like all points within a population to be unique (time on the cluster is expensive so i would like to optimally use the jobs i submit and not have duplicates). Also since the jobs simulate physics I need to constrain my values as some values just do not make any sense physically (e.g. negative number of particles or negative length).
So far so good. pygad seems to have two parameters allow_duplicate_genes
and gene_space
that seem to be doing exactly what I want according to my understanding of the docs.
However, this code:
import pygad
import numpy
import itertools
def mock_function(solution, sol_idx):
return -1 * (solution[0] ** 2 + solution[1] ** 2)
range = [.1, 1.]
param_values = numpy.linspace(range[0], range[1], 10).tolist()
initial_population = list(itertools.product(param_values, param_values))
ga = pygad.GA(num_generations=100,
num_parents_mating=10,
gene_type=float,
gene_space={'low': range[0], 'high': range[1]},
fitness_func=mock_function,
initial_population=initial_population,
mutation_probability=0.2,
allow_duplicate_genes=False,
)
ga.run()
new_points = ga.population
unique_points = set([tuple(x) for x in new_points])
print(len(initial_population), len(new_points), len(unique_points))
print(f"Best solution is {ga.best_solution()[0]}.")
for point in new_points:
for x in point:
if not range[0] <= x <= range[1]:
print(f"Point {point} is outside gene_space.")
Both produces tons of non-unique points and also occasionally points outside the gene_space. What am I missing?
Thanks in advance for the help and let me know if you need any more details.
Edit: In the github repo there is an (open) issue about this see here claiming that the latest release should fix the uniqueness issue.
However, in this example it still does not.
It also still produces points outside of the gene_space
.