Feature selection for sparse and unbalanced high dimensional data

Question

I have a highly unbalanced data with very scarce positive labels. The data is very high dimensional. On top of that my features are also very sparse.

So what would be the best way to do feature selection in this case. Any correlation measure rank based like spearmann or pearson correlation will not be a good one. Because most of my labels as well as features are zeros and it might seem that this feature is highly correlated or something even though it is not that much significant.

Any suggestion guys?

score 0 · Answer 1 · answered Jul 22 '14 at 18:13

0

SVMs work well for classification of sparse data. By examining the kernel matrix produced you can identify the features that were more important than others and used those for your feature selection.

answered Jul 22 '14 at 18:13

wckd

410
2
9

Feature selection for sparse and unbalanced high dimensional data

1 Answers1