Consider this example
library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"this is china",
"china is here",
'hello china',
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
"Kyoto Japan",
"Tokyo Japan Chinese",
'japan'),
class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))
I use quanteda
to get a document term matrix
from this dataframe
dtm <- quanteda::dfm(dtrain$text)
> dtm
Document-feature matrix of: 19 documents, 11 features (78.5% sparse).
19 x 11 sparse Matrix of class "dfm"
features
docs chinese beijing shanghai this is china here hello kyoto japan tokyo
text1 2 1 0 0 0 0 0 0 0 0 0
text2 2 0 1 0 0 0 0 0 0 0 0
text3 0 0 0 1 1 1 0 0 0 0 0
text4 0 0 0 0 1 1 1 0 0 0 0
text5 0 0 0 0 0 1 0 1 0 0 0
I can fit a lasso
regression using glmnet
easily:
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
However, plotting the fit
does not show the labels of the dtm
matrix (and I only see three curves). What is wrong here?