7

I tried naive bayes classifier and it's working very bad. SVM works a little better but still horrible. Most of the papers which i read about SVM and naive bayes with some variations(n-gram, POS etc) but all of them gives results close to 50% (authors of articles talk about 80% and high but i cannt to get same accurate on real data).

Is there any more powerfull methods except lexixal analys? SVM and Bayes suppose that words independet. These approach called "bag of words". What if we suppose that words are associated?

For example: Use apriory algorithm to detect that if sentences contains "bad and horrible" then 70% probality that sentence is negative. Also we can use distance between words and so on.

Is it good idea or i'm inventing bicycle?

Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
Neir0
  • 12,849
  • 28
  • 83
  • 139

4 Answers4

6

You're confusing a couple of concepts here. Neither Naive Bayes nor SVMs are tied to the bag of words approach. Neither SVMs nor the BOW approach have an independence assumption between terms.

Here's some things you can try:

  • include punctuation marks in your bags of words; esp. ! and ? can be helpful for sentiment analysis, while many feature extractors geared toward document classification throw them away
  • same for stop words: words like "I" and "my" may be indicative of subjective text
  • build a two-stage classifier; first determine whether any opinion is expressed, then whether it's positive or negative
  • try a quadratic kernel SVM instead of a linear one to capture interactions between features.
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • What do you think about apriory algorithm and assotiation between words? – Neir0 Jun 11 '12 at 14:18
  • @Neir0: I don't immediately see how you'd want to apply it. I've also never seen attempts to do sentiment analysis with it. I know that some people use it to construct approximations to the quadratic kernel (roughly what you call "word associations"), but then I'd try a vanilla kernel SVM first. – Fred Foo Jun 11 '12 at 14:25
  • Straightforward way is to input tokens with badge neg or pos. For example: "pos i love my mom". On output i get something like " if we have love and mom in senteces then 70% that we have pos badge". Of course we can modify this approach for better results. – Neir0 Jun 11 '12 at 14:32
  • @Neir0: sure, that's an approach you could try. It does seem overkill, though -- IIUC, Apriori is intended to find arbitrary associations between items in its input, while this is a classification task, where you *know* which property of the input you want to predict (polarity); it seems like you're throwing away knowledge about the task. – Fred Foo Jun 11 '12 at 14:39
5

Algorithms like SVM, Naive Bayes and maximum entropy ones are supervised machine learning algorithms and the output of your program depends on the training set you have provided. For large scale sentiment analysis I prefer using unsupervised learning method in which one can determine the sentiments of the adjectives by clustering documents into same-oriented parts, and label the clusters positive or negative. More information can be found out from this paper. http://icwsm.org/papers/3--Godbole-Srinivasaiah-Skiena.pdf

Hope this helps you in your work :)

Aravind Asok
  • 514
  • 1
  • 7
  • 18
2

You can find some useful material on Sentimnetal analysis using python. This presentation summarizes Sentiment Analysis as 3 simple steps

  • Labeling data
  • Preprocessing &
  • Model Learning
Nitin Pawar
  • 1,634
  • 19
  • 14
0

Sentiment Analysis is an area of ongoing research. And there is a lot of research going on right now. For an overview of the most recent, most successful approaches, I would generally advice you to have a look at the shared tasks of SemEval. Usually, every year they run a competition on Sentiment Analysis in Twitter. You can find the paper describing the task, and the results for 2016 here (might be a bit technical though): http://alt.qcri.org/semeval2016/task4/data/uploads/semeval2016_task4_report.pdf

Starting from there, you can have a look in the papers describing the individual systems (as referenced there).

buechel
  • 717
  • 7
  • 18