I want to know if text2vec package can be used for multilabel classification like python's BinaryRelevance in skmultilearn.problem_transform I'm currently referring to the pipeline documented at: http://text2vec.org/vectorization.html
Asked
Active
Viewed 236 times
0
-
You could use the `mlr` package, which provides a multilabel wrapper for classifiers – TMrtSmith Apr 05 '19 at 19:22
1 Answers
0
You can use text2vec to create document-term-matrix (dtm). To create dtm, you can use http://text2vec.org/vectorization.html. When your dtm matrix is ready, you can use them for multi-label classification. For classification, xgboost model is one of the good models, which is discussed in https://rpubs.com/mharris/multiclass_xgboost.
library(xgboost)
# dtm_train is the training matrix obtained by text2vec
# dtm_test is the testing matrix obtained by text2vec
# label_train is labels for dtm_trian; should be factors
# label_train <- factor(label_train, labels = classes)
nclass <- 3 # how many classes you have
param <- list("objective" = "multi:softmax", # multi class classification
"num_class"= nclass , # Number of classes
"eval_metric" = "mlogloss", # evaluation metric
"nthread" = 8, # number of threads to be used
"max_depth" = 16, # maximum depth of tree
"eta" = 0.3, # step size shrinkage
"gamma" = 0, # minimum loss reduction
"subsample" = 0.7, # part of data instances
"colsample_bytree" = 1, # subsample ratio
"min_child_weight" = 12 # minimum sum of instance weight
)
bst = xgboost(
param=param,
data =as.matrix(dtm_train),
label = label_training,
nrounds=200)
# Make prediction on the testing data.
pred <- predict(bst, as.matrix(dtm_test))
Hopefully helps.
Please let me know if you need further explanation.

Sam S.
- 627
- 1
- 7
- 23