Performance impacts of different ways to perform genetic operations on the genome of a multivariable genetic algorithm

Question

I use genetic algorithms in my research a lot and i ran across an interesting question about how best to perform your genetic operations on a genome. Say you have a function defined by f(x,y) = ax^n + bx^n-1 + ... + cy^m + dy^m-1 ... etc. its just a multivariable function that is somewhat expensive to calculate so you're trying to be as efficient as possible with your genetic operations.

If you're using a binary representation of the genome i have figured out that there are 2 reasonable ways to perform the genetic operations. Lets look at just the crossover stage.

Here is the code for a vectorized tournament selection in Matlab (for context to variable names)

%% Tournament Selection
T = round(rand(2*popSize,S)*(popSize-1)+1);     % Tournaments
[~,idx] = max(F(T),[],2);                       % Index of Winners
W = T(sub2ind(size(T),(1:2*popSize)',idx));     % Winners

So you have 2 different variables that are being optimized and my question is do you want to split up the genetic operations so that you have a crossover applied to each variable individually and then concatenate the arrays back together which would look like this for a 2 point crossover:

%% 2 Point Crossover

Pop2 = Pop(W(1:2:end),:);                   % Set Pop2 = Pop Winners 1
P2A  = Pop(W(2:2:end),:);                   % Assemble Pop2 Winners 2

% Split Pop2 for x and y genomes
xPop2 = Pop2(:,1:genome/2);
yPop2 = Pop2(:,genome/2 + 1:end);

% Split P2A for x and y genomes
xP2A = P2A(:,1:genome/2);
yP2A = P2A(:,genome/2+2:end);

% For x genome
Ref  = ones(popSize,1)*(1:genome/2);                     % Reference Matrix
CP   = sort(round(rand(popSize,2)*(genome/2-1)+1),2);    % Crossover Points
xidx  = CP(:,1)*ones(1,genome/2)<Ref & CP(:,2)*ones(1,genome/2)>Ref;   % Logical Index
xPop2(xidx) = xP2A(xidx);                       % Recombine Winners


% For y genome
Ref  = ones(popSize,1)*(1:genome/2);                     % Reference Matrix
CP   = sort(round(rand(popSize,2)*(genome/2-1)+1),2);    % Crossover Points
yidx  = CP(:,1)*ones(1,genome/2)<Ref & CP(:,2)*ones(1,genome/2)>Ref;   % Logical Index
yPop2(yidx) = yP2A(yidx);                       % Recombine Winners

Pop2 = horzcat(xPop2,yPop2);
P2A = horzcat(xP2A,yP2A);

or do you treat the genome as a single crossover operation and just perform the 2 point crossover as if it were only a single variable genome like this:

Pop2 = Pop(W(1:2:end),:); % New Pop is Winners of old Pop
P2A = Pop(W(2:2:end),:); % Assemble Pop2 Winners 2
Ref = ones(popSize,1)*(1:genome); % Ones Matrix
CP = sort(round(rand(popSize,2)*(genome-1)+1),2); % Crossover Points
idx = CP(:,1)*ones(1,genome)<Ref&CP(:,2)*ones(1,genome)>Ref; % Index
Pop2(idx)=P2A(idx); % Recombine Winners

Does anyone know of any research that has been done that shows the difference in the 2 different ways to represent the genomes? I haven't been able to find anything published about it, but it may just be because i didn't know how to intelligently word my question in google.

Thanks

This branch of study is truly quite interesting, and, of course, there are standard, proven, trivial procedures to tackle, solve and analyze this class of problems. First of all, i would wish to know: 1. What is your concrete question?: a framework for solve or optimize the solution? / a method for selection of the mutation procedure? a method for selection of objective functions? 2. Actually the multinomial objective function is not a complex function to solve -this is C0,..,Cn continuous :), or will you actually require to apply GA onto non C0,... Cn functions, such as discrete procedures? — Brethlosze, Apr 12 '15 at 04:42
3. Are you bunded onto the GA class of methods? Can you extend to others? 4. Just as reference, under which discipline your question work is focused, IT, Mathematics, Engineering? — Brethlosze, Apr 12 '15 at 04:43
So my main question is about how to represent a multivariable function in binary during the crossover stage. I have presented 2 different techniques for crossover the first just naively crosses the entire genome and the second technique first splits the genome into its individual variables and crosses each one separately. Im curious if there is research that shows which one is better? I would guess that with too many variables you wouldnt want to split them up, but for a 2 variable case ive been getting slightly faster convergence — Grant Williams, Apr 13 '15 at 06:25
2. The function i gave is absolutely unrelated to the problem i was just showing how its a multivariable function. The function im actually dealing with is a summation of very computationally integrals. 3. The research i am doing is based around the use of a Genetic Algorithm so i cant use other methods for this particular problem. 4. I am a physics and mathematics student but the project was for a the national weather center (meteorology) and now i do research on GA's themselves so i would say it was more of a general question than one related to any specific discipline! — Grant Williams, Apr 13 '15 at 06:30

score 2 · Accepted Answer · answered Apr 18 '15 at 06:41

My main comment are as follows:

The mutation stage strongly depends on the problem type.
For continuous, non linear, dynamical problems, such as the Weather Meteorological Physical problem you cited, with estimation and prediction processes, still a GA framework could perform very well, if it is stated properly and always with a clear focus on what you are pursuing with him. A discrete mutation strategy -such as the single selective crossover presented- would perform really poor, just like a standard nonguided, random, bruteforce search. As you said, indeed this is the naivest approach. The performance & execution time, will express this fact in a nondoubtly manner.
Having clear the problem is physical -and i am assuming just too much-, and thus, given there is a dynamical system, probably with a certain number of states, finite or not, and given a certain continuity on the variables, and with or without some non-linearity degree, the main GA focus here should involve Bayesian Estimators. Under this context, each winner or individuals -or any other descriptive word you would wish to apply- would have a mutation stage not based on a simple exchange of parameters. This is just the simplest, easier, shinier explanation for GA. The better, faster, efficient way to interpret the mutation stage on this context equals to apply an statistical estimator able to gather the proper gradient from this dynamical nonlinear system, estimated at each stage. Under a Bayesian GA framework, Particle Filters are an interesting alternative for preparing a good mutation stage.
The fact that you are using multiple, nested integrals, enlighten me on thinking your problem is truly dynamical. And under that context, guiding the GA through a more statistical bayesian framework makes me a lot of sense. Remember that Bayesian methods are also robust under certain conditions.
If the problem is plainly dynamical, you should then face the truth, and considering discard GA, and moving to an Non-Linear Optimization techniques. Even, Parameter Estimation, and System Identification methods would even perform better, if the problem can be suitably expressed as a parameter search process instead.
By other way, if your problem is in a no distinguishable way discrete -that is, if you do not have PDE expressions of any kind, or if these PDEs are really trivial to solve, or if there is plainly not any physical process involving states- and your problem is mainly based on expert rules, databases lookups, or processing large data systems and you have discarded -by any reason you find pertinent- to not have a dynamical model from these data- then, and only then, i would try to guide the GA as a brute force method, with random crossover stages between the better solutions. A Discrete Bayesian framework would even be useful here, from the probability estimates for every mutation stage.

If you have more information, i could make this answer evolve....

Best wishes, and luck hypfco

Thank you so much for the thorough answer. The way the problem is set up i actually think a GA or another type of heuristic algorithm (swarm intelligence, simulated annealing, etc.) is the best fit for the problem. I will look into implementing a Bayesian Genetic Algorithm. I haven't heard of them until now, but after an initial glimpse at some papers it seems like a very interesting technique that i would like to try. Do you have any interesting papers that present a well used Bayesian GA? Thank you again for the help! — Grant Williams, Apr 20 '15 at 19:29

Performance impacts of different ways to perform genetic operations on the genome of a multivariable genetic algorithm

1 Answers1