I have a set of 240 features extracted using Image Processing. The objective is to classify test cases into 7 different classes after training. For each class there are about 60 observations(viz, I have around 60 feature vectors for each class with each vector having 240 components).
Many research papers and books make use of the Sequential Forward Search or Sequential Backward search for selection of the best features from a feature vector.
The following picture gives a sequential forward search algorithm.
Any such algorithm uses some criterion to discriminate between features. A common method is to use the Bhattacharyya Distance as a criterion. The Bhattacharyya Distance is a divergence type measure between distributions. On some research and study I found that given a matrix M1 for a class A consisting of all the 60 feature vectors of this class such that it has n=60 rows and m=240 columns (since there are a total of 240 features) and a similar matrix M2 for a class B I can find out the Bhattacharyya Distance between them and find their interdependence.
My question is how do I integrate the two. How do I include the Bhattacharyya Distance as a criterion for selecting the best features in the algorithm as described above.