10

I'm playing around with sentiment analysis, and I'm looking for some seed data. Is there a free dictionary around?

It can be really simple: 3 sets of texts/sentences, for "positive", "negative", "neutral". It doesn't have to be huge.

Eventually I'll probably generate my own seed data for my specific use case, but it would be great to have something to play with now while I'm building the thing.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Ken
  • 5,337
  • 3
  • 28
  • 19
  • I have The Bing Lui and Minqing Hu dataset (about 7000 reviews from about 9 products on amazon.com) I put them on an Excel Sheet with the combined average score of each one of them. I also added the score of 3 different Free sentiment analysis APIs from the web(ViralHeat, AlchemyAPI, repustate API) if you want that Excel Sheet I can give it to you. – smohamed Oct 31 '11 at 07:45
  • 1
    http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon – zengr Mar 05 '13 at 10:05
  • @SherifMaherEaid: How you create your own dictionary from articles? – user123 Aug 20 '13 at 17:40
  • @user123 probably he categorizes the words and phrases used in different reviews which can be good, bad or neutral. – Bhargav Nanekalva Nov 04 '14 at 15:23
  • +1 Thanks for asking the question :) – Bhargav Nanekalva Nov 04 '14 at 15:26

4 Answers4

4

Bing Liu and Minqing Hu from UIC have a number of datasets:

Bo Pang from Cornell has some more.

Oren Trutner
  • 23,752
  • 8
  • 54
  • 55
3

If you're interested in sentiment dictionaries, many authors have presented work based on manually built lists, and other semi automated methods for obtaining lists of opinionated terms. One good approach is to derive it from the WordNet database, by extending a core of positive/negative words using relationships like synonyms etc.

A good example of a manually built list is the General Inquirer.

For a semi automated method that derives lists, check out SentiWordNet from Esuli and Sebastiani.

These I believe are generally available for research, but you may need to get in touch with the authors regarding the use of these resources for non-research purposes.

B.

bohana
  • 111
  • 1
  • 2
1

You can use the AFINN word list here:

http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010

AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive). The words have been manually labeled by Finn Årup Nielsen in 2009-2011. The file is tab-separated. There are two versions:

AFINN-111: Newest version with 2477 words and phrases.

AFINN-96: 1468 unique words and phrases on 1480 lines. Note that there are 1480 lines, as some words are listed twice. The word list in not entirely in alphabetic ordering.

Clay
  • 2,949
  • 3
  • 38
  • 54
1

I maintain a list of corpora and word lists for sentiment analysis (where my AFINN is one of them):

http://neuro.compute.dtu.dk/wiki/Sentiment_analysis#Corpora

http://neuro.compute.dtu.dk/wiki/Sentiment_analysis#Affective_word_lists

Finn Årup Nielsen
  • 6,130
  • 1
  • 33
  • 43