In FastText, I have unbalanced labels. What is the best way to handle it?
-
1This blog https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/ gives some general answers can you add some details as to the domain specifics? – Veltzer Doron Jun 26 '18 at 15:55
-
I dont see any satisfactory answer. is there a better resolution? – Anuj Gupta Sep 21 '18 at 04:46
2 Answers
In our case here we have a very skewed dataset with 200+ classes and 20% of the classes containing 80% of all data.
In our data, even with this highly skewed data, we have a clear definition of the texts inside our categories.
Example: Text of the Majority Class: "Hey, I need a computer and a mouse to open the internet and post a programming answer in Stack Overflow"
Text of the Minority Class: "Hey, could please give me the following items: Eggs, lettuce, onions, tomatoes, milk and wheat?"
As FastText deals with WordNGrams and hierarchical split if you have a very well defined category as my case above, the imbalance it's not a problem because of the nature of the algorithm.
Reference: Bag of Tricks for Efficient Text Classification - Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov

- 759
- 1
- 11
- 24