5

I want to do a multi-label text classification on a big data set set and it seems like that big data machine learning tools such as Apache Mahout or Spark MLLib are not currently support that. I would like to know has any one done a multi-label classification for big data sets before? Are there any plan to integrate multi-label classification in either Mahout or Spark in the near future?

HHH
  • 6,085
  • 20
  • 92
  • 164

2 Answers2

0

This paper addresses the nature of the benefits you would receive from multioutput forecasting... namely:

  1. The ability to account for multiple independent input parameters when making a prediction, rather than having to continuously update your metrics for each nth index prediction your are trying to make within a given forecast.
  2. Computational speed is increased.

Based on your need - I would recommend trying to down-sample to a smaller group for your current problem and then create multiple models around bespoke groups within your dataset if performance does not match what you are looking for.

I am still encountering this challenge myself (4 years since your post...).

Here is a list of helpful articles that I have collected while trying to address this:

shadow_dev
  • 130
  • 1
  • 1
  • 14
0

Can we first transform the labels into a class, and then after prediction, transform it back to the original label? for example, i have 3 labels to predict, [y1, y2, y3]. if [y1, y2, y3] = [1, 0, 1], then i give it label = 101 = 5. And during prediction, I predicted the probability of y1 in the following way: p(y1=1) = p(100) + p(101) + p(110) + p(111). In this way a multi label problem became a multilabel problem

I_love_vegetables
  • 1,575
  • 5
  • 12
  • 26
Hao He
  • 1