Finding dissimilar dimensions in a feature vector in Mahout

Question

If I use a similarity based algorithm such as pearson correlation score to compare two feature vectors and I want to know those dimensions/feature fields which are very much dissimilar amongst the feature set then what is the algorithm to be used? I am using Mahout which is a machine learning library for Java

I am using Mahout which is a machine learning library in Java — seahorse, Mar 13 '12 at 16:01
If you want to get feedback here, you should mention that in your question and perhaps post the piece of code you are working on as well. — specialscope, Mar 13 '12 at 16:04
what exactly do you want to do? Find similarities between data in 2 vectors? — Adrian, Mar 13 '12 at 16:19

score 1 · Accepted Answer · answered Mar 13 '12 at 16:11

1

Well, it would just be the dimension in which the two vectors differed most -- in which the absolute value of the difference of the vectors' values in the dimension was largest. Is that really all you mean or are you looking for something subtler?

answered Mar 13 '12 at 16:11

Sean Owen

66,182
23
141
173

Ok say I have fv1, fv2, fv3, fv4 and fv5 as the feature vectors which are supposed to be very "similar". Now for feature vector 2 = fv2(say) I need to find which dimensions are awkward or have a large variation of disimilarity as compared to the other dimensions. For this I want to compare fv2 with all other feature vectors and then come up with the answer.So I need to calculate average absolute difference across all vectors or is there some better statistic? – seahorse Mar 13 '12 at 16:23
1

Absolute difference from the average is reasonable; I might suggest something more normalized like a z-value -- just the number of standard deviations from the mean the value is. – Sean Owen Mar 13 '12 at 16:37

Finding dissimilar dimensions in a feature vector in Mahout

1 Answers1