0

I have built a model that classifies emails based on keywords in subjects(features) into 10 different work_categories (labels). However, this task might consist of possibility where the emails should be categorized in more than a single label.

For example: A email with subject "Servicing certificates and Transfer them" should be classified into two labels i.e. Servicing Worktype (label 1) and Transfer Worktype (label 2). In my current program, this gets assigned to label 1 only. Is there anyway i can achieve classifying the email to both labels using spark ml in java ?

I have been following https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples/ml for guidance, but there is nothing on multilabel classification.

Please, let me know if you have any suggestions or documentation that can help me on this. Thanks

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156

1 Answers1

0

Both LogisticRegressionWithLBFGS and LogisticRegressionWithSGD support multi-label classification:

LogisticRegressionWithLBFGS

Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS. Standard feature scaling and L2 regularization are used by default.

or

LogisticRegressionWithSGD:

Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent. By default L2 regularization is used, which can be changed via LogisticRegressionWithSGD.optimizer.

Using LogisticRegressionWithLBFGS is recommended over this.

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156