0

I have a enormous data set of texts, from which I have separated the text which holds particular keyword/s. Here is the data set with particular keywords. Now my next task is classify this data set according to 8 emotions and 2 sentiments, in total there will be 10 different classes. I have got this idea from NRC emotion lexicon which holds 14182 different words with their emotion+sentiment classes. The main NRC work in http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm. I know Naive Bayes classification, or clustering works well with binary classification (for say, two class positive and negative sentiment). But when 10 class problem comes, I have no idea how I will process further. I would really appreciate for your suggestion. I am doing the assignment with R. The final result will be as bellow:

|==================================|====================================|
|   SentencesWithKeywords          |      emotion or sentiment class    |
-----------------------------------|------------------------------------|
|conflict need resolved turned     | anger/anticipation/disgust/fear/joy|
|conversation  exchange ideas      |     negative/positive/sadness/     | 
|richer environment                |            surprise/trust          | 
|                                  |                                    |
|----------------------------------|------------------------------------|
|     sentence2                    |anger/anticipation/disgust/fear/joy |
|                                  |     negative/positive/sadness/     |
|                                  |           surprise/trust           |
|----------------------------------|------------------------------------|
  • Naiv Bayes is inherently multi class...Many clustering algorithms as well. (Each cluster is a kind-of class). You should be able to find an implementation in `e1071` – CAFEBABE Feb 13 '16 at 21:47
  • Thank you CAFEBABE for your suggestion. It was also in my mind, trying to implement. –  Feb 13 '16 at 23:17
  • But @CAFEBABE , one question, my text is unlabeled, will it work in NB? I do not think so. –  Feb 14 '16 at 10:57

1 Answers1

0

You should check out the caret package (http://topepo.github.io/caret/index.html). What you are trying to do are two different classifications (one mulit-class and one two class problem). Represent the document as term frequency vectors and run a classification algorithm of your choice. SVMs usually work well with bag of words approaches.

buechel
  • 717
  • 7
  • 18
  • You would need some training data of course. Check out https://www.crowdflower.com/data-for-everyone/ for example. – buechel Aug 14 '16 at 08:41