4

I've been utilizing lots of different corpora for natural language processing, and I've been looking for a corpus that has been annotated with Wordnet Word Senses.

I understand that there probably is not a big corpus with this information, since the corpus needs to be built up manually, but there has to be something to go off of.

Also if there isn't a corpus in existence, is there at least a sense annotated ngram database (with what percentage of the time a word is each of its definitions, or a numerical count of each wordnet definition depending on how common the word sense is)?

9000
  • 39,899
  • 9
  • 66
  • 104
cardine
  • 193
  • 1
  • 6

3 Answers3

8

Three prominent corpora annotated for WordNet:

cyborg
  • 9,989
  • 4
  • 38
  • 56
  • 2
    SemCor was by far the best one out of all the ones linked. Looks like there are not a lot of high quality WordNet annotated corpus' available right now. – cardine Jan 22 '12 at 08:21
  • @cardine and cyborg, sorry for the comment, but i couldn't find your contact info. could you email me at info @ panabee.com? based on your NLP interests, i have a small project you might be interested in. thanks. – Crashalot Apr 02 '13 at 21:12
1

Some of the SENSEVAL (now SEMEVAL) data is annotated with WordNet.

Fabian Steeg
  • 44,988
  • 7
  • 85
  • 112
0

you can use senseval2, for java there is a semcor format and (jSemcor API) and also senseval3. these two corpus are used for Word sense disambiguation.