As is commonly known glmnet
can be used as a tool for feature selection. A toy example:
library(glmnet)
# Binomial dataset, the number of classes is 2
data(BinomialExample)
# data truncation to 10 columns, just to make the example dataset smaller
x <- BinomialExample$x[,1:10]
y <- BinomialExample$y
cvfit = cv.glmnet(x, y, family = "binomial")
coefs <- coef(cvfit)
The coefs
variable shows which features have been selected (in the example all features except V1
and V7
). This result is clear and understandable.
> coefs
11 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) 0.1048257
V1 .
V2 0.5901863
V3 -0.4060696
V4 -0.9627180
V5 -0.1067188
V6 -0.7813739
V7 .
V8 -0.4106554
V9 0.5733065
V10 -1.0492793
The problem is how to interpret the output results if the number of classes is more than two. A toy example:
# Multinomial, the number of classes is 3
data(MultinomialExample)
x <- MultinomialExample$x[,1:10]
y <- MultinomialExample$y
cvfit = cv.glmnet(x, y, family = "multinomial")
coefs <- coef(cvfit)
Now coefs
stores three propositions of features to be selected.
Question: which set should be used as the best set of features?
In other words: is it possible to use glmnet
as a feature selection tool at all, when we have more than 2 classes?
> coefs
$`1`
11 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.03279324
V1 .
V2 -0.08585827
V3 0.40882396
V4 -0.08639670
V5 -0.15763031
V6 0.22513768
V7 .
V8 0.17657623
V9 .
V10 .
$`2`
11 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.01255996
V1 -0.21913800
V2 .
V3 .
V4 .
V5 0.41329881
V6 .
V7 .
V8 .
V9 -0.57131512
V10 0.52214739
$`3`
11 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.02023328
V1 0.09163282
V2 0.42655929
V3 .
V4 0.29403632
V5 .
V6 -0.12306560
V7 .
V8 -0.44815059
V9 0.88580234
V10 -0.20920812