3

I'm quite new to text mining and I'm challenging my self to do the sentiment analysis today. But I encounter some problems while doing the sentiment analysis. In my language, a word can have some different meanings. Like "setan" means : 1) devils 2) cursing words. How to solve this ambiguity in sentiment analysis? Also for everyone's information, the algorithm that I use is naive bayes classifier. And for the tools, I'm using RapidMiner. I need your help. Any tips would be great. Thank you!

1 Answers1

3

Training your data on a Naive Bayes classifier would make the model assign a probability for each word for every different class that you are trying to classify. In your case, since it's sentiment analysis, if you have Positive and Negative as the two classes, you would have probability for setan being Positive and Negative.

Keeping this in mind, if a word has multiple meanings that could account for both positive and negative sentiment, I would say make sure to include both kind of instances in your data so that while training the model, the corresponding probabilities are used to classify new text into Positive or Negative class.

In your case, it seems like both the meanings of setan have a negative connotation which really shouldn't be a problem. Words like "the","a" which are present in both Positive and Negative instances, famously called the stopwords should be removed since they don't really count towards the classification.

In your case if you are trying to train the model using their meanings specifically, you can refer this paper https://pdfs.semanticscholar.org/fc01/b42df3077a512620456d8a2714951eccbd67.pdf.

Rohan Ds
  • 426
  • 3
  • 8