Using Bhattacharyya Distance for feature selection

Question

I have a set of 240 features extracted using Image Processing. The objective is to classify test cases into 7 different classes after training. For each class there are about 60 observations(viz, I have around 60 feature vectors for each class with each vector having 240 components).

Many research papers and books make use of the Sequential Forward Search or Sequential Backward search for selection of the best features from a feature vector. The following picture gives a sequential forward search algorithm. Here is a snapshot of the SFS algorithm

Any such algorithm uses some criterion to discriminate between features. A common method is to use the Bhattacharyya Distance as a criterion. The Bhattacharyya Distance is a divergence type measure between distributions. On some research and study I found that given a matrix M1 for a class A consisting of all the 60 feature vectors of this class such that it has n=60 rows and m=240 columns (since there are a total of 240 features) and a similar matrix M2 for a class B I can find out the Bhattacharyya Distance between them and find their interdependence.

My question is how do I integrate the two. How do I include the Bhattacharyya Distance as a criterion for selecting the best features in the algorithm as described above.

Sohaib · Accepted Answer · 2013-11-11T15:48:48.553

With help from Arthur B. I finally understood the concept. Here is my implementation of it. Although I used the Plus l Take away r algorithm (Sequential Forwards Backward Search) Ill post that as it is basically the same once the Backward Search is removed. The below implementation is in matlab but very simple to understand:

S=zeros(Size,1); %Initial the binary array feature list with all zeros implying no feature selected
k=0;
while k<n  %Begin SFS. n is the number of features that need to be extracted
t=k+l;     %l is the number of features to be added in each iteration
while k<t
    R=zeros(Size,1);  %Size is the total number of features
    for i=1:Size
        if S(i)==0    %If the feature has not been selected. S is a binary array which puts a one against each feature that is selected
            S_copy=S;
            S_copy(i)=1;
            R=OperateBhattacharrya(Matrices,S_copy,i,e,R);  %The result of each iteration is stored in R
        end
    end
    k=k+1;   %increment k
    [~,N]=max(R);  %take the index of the maximum element in R as the best feature to be selected
    S(N)=1;        % put the index of selected feature as 1
end
t=k-r;    %r is the number of features to be removed after selecting l features. l>r
while k>t  %start Sequential Backward Search 
    R=zeros(Size,1);
    for i=1:Size
        if S(i)==1
            S_copy=S;
            S_copy(i)=0;
            R=OperateBhattacharrya(Matrices,S_copy,i,1,R);
        end
    end
    k=k-1;
    [~,N]=max(R);
    S(N)=0;
end
fprintf('Iteration :%d--%d\n',k,t);
end

I hope this helps anyone who has a similar problem.

@Matthieu There would have been. This was a research project close to two years ago. — Sohaib, Apr 19 '15 at 10:26

score 1 · Answer 2 · answered Oct 31 '13 at 15:12

1

That's the "evaluate the branch" part of the algorithm, except you'll first use this Bhattacharyya distance on one dimensional vectors, then two dimensional vectors, etc.

answered Oct 31 '13 at 15:12

Arthur B.

3,445
3
21
24

Can u elaborate a little if I explain my whole problem? – Sohaib Oct 31 '13 at 15:13
Feel free to add more details. SFS is a very simple greedy approach to selecting features. – Arthur B. Oct 31 '13 at 15:21
Umm actually read the first paragraph of my question I had added the details. Acc. to my understanding Bhattcharyya Distance finds the distance between two classes for example the rows of a matrix represent the number of observations and the columns represent each feature so in my case its like a 60x240 matrix. Am I right? – Sohaib Oct 31 '13 at 15:27
To find the best feature in the set, you'll use a 60x1 matrix. Then to find the second best, a 60x2 matrix... etc. – Arthur B. Oct 31 '13 at 15:35
So I find the distance of class 1 to the other 6 classes add them up. Then of class 2 and add them up and so on. Finally the 7th class with all other classes. Whichever class has the highest distance It would make it the best? Wouldn't that be too complex? – Sohaib Oct 31 '13 at 15:38
No, you're selecting features not classes! – Arthur B. Oct 31 '13 at 16:35
You are right What is the meaning of each possible branch then? – Sohaib Oct 31 '13 at 17:28
Each branch represent a feature that you're tentatively adding to a growing set of retained features. – Arthur B. Oct 31 '13 at 18:20
If I don't compare classes how do I say one feature is better than another? Do I find the best set of features for each class individually? If I'm asking too much point me to a resource which tells me how. – Sohaib Oct 31 '13 at 18:23
For each branch, you decide how good the branch is by comparing all your classes. You could use the sum of all pairwise Bhattacharyya distances between classes for instance, or the sum of the Bhattacharyya distance between each class and the rest of the classes. – Arthur B. Oct 31 '13 at 18:25
Could you explain the latter point. The sum of distance between each class and the rest of the classes. That would be if I have 4 classes the sum of `1,2 1,3 and 1,4` right? And then? I run the algo for each class separately? And for each class find the best set of features? – Sohaib Oct 31 '13 at 18:34
Don't take it the wrong way, but you should probably go back to some basics and build up to this. – Arthur B. Oct 31 '13 at 18:36
Haha I knew it would come to that. I have read and read everywhere the filter and the search algorithm are explained separately and I just don't seem to be able to get it together. That's why I asked for a resource. Thanks for all the help :) – Sohaib Oct 31 '13 at 18:39

Using Bhattacharyya Distance for feature selection

2 Answers2