If I use a similarity based algorithm such as pearson correlation score to compare two feature vectors and I want to know those dimensions/feature fields which are very much dissimilar amongst the feature set then what is the algorithm to be used? I am using Mahout which is a machine learning library for Java
Asked
Active
Viewed 255 times
0
-
This is not really a programming question, is it? – specialscope Mar 13 '12 at 15:57
-
I am using Mahout which is a machine learning library in Java – seahorse Mar 13 '12 at 16:01
-
If you want to get feedback here, you should mention that in your question and perhaps post the piece of code you are working on as well. – specialscope Mar 13 '12 at 16:04
-
what exactly do you want to do? Find similarities between data in 2 vectors? – Adrian Mar 13 '12 at 16:19
-
@Adrian - I have explained in detail below to Sean Owen – seahorse Mar 13 '12 at 16:27
1 Answers
1
Well, it would just be the dimension in which the two vectors differed most -- in which the absolute value of the difference of the vectors' values in the dimension was largest. Is that really all you mean or are you looking for something subtler?

Sean Owen
- 66,182
- 23
- 141
- 173
-
Ok say I have fv1, fv2, fv3, fv4 and fv5 as the feature vectors which are supposed to be very "similar". Now for feature vector 2 = fv2(say) I need to find which dimensions are awkward or have a large variation of disimilarity as compared to the other dimensions. For this I want to compare fv2 with all other feature vectors and then come up with the answer.So I need to calculate average absolute difference across all vectors or is there some better statistic? – seahorse Mar 13 '12 at 16:23
-
1Absolute difference from the average is reasonable; I might suggest something more normalized like a z-value -- just the number of standard deviations from the mean the value is. – Sean Owen Mar 13 '12 at 16:37