0

I am working on polysemy disambiguation project and for that I am trying to find polysemous words from input query. The way I am doing it is:

#! /usr/bin/python
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
stop = stopwords.words('english')
print "enter input query"
string = raw_input()
str1 = [i for i in string.split() if i not in stop]
a = list()
for w in str1:
    if(len(wn.synsets(w)) > 1):
        a.append(w)

Here list a will contain polysemous words. But using this method almost all words will be considered as polysemy. e.g if my input query is "milk is white in colour" then it is storing ('milk','white','colour') as polysemy words

Madhusudan
  • 435
  • 2
  • 9
  • 26
  • 1
    That's because all of those words have more than one possible meaning. Your script seems to be working correctly. [Take a look on WordNet](http://wordnetweb.princeton.edu/perl/webwn). You'll see that 'milk', 'white', and 'colour' are all polysemous. – tsroten Feb 25 '14 at 07:42
  • we can't say white as polysemous word because all senses in wordnet are related to color only...In case of bank some senses are related to financial sector and some are related to river bank, Thats why it is considered as polysemy. – Madhusudan Feb 25 '14 at 07:51
  • *(adj) white (benevolent; without malicious intent) "that's white of you"* -- that one is not related to color. To me, it looks like you are getting the correct values for your code. – tsroten Feb 25 '14 at 07:54
  • Hello..Thanks for your replies. As per the answer below, all senses are not different some are related. Consider "british" word which is not polysemy as all senses of british from wordnet are related to british person still it is showing it as a plysemy word – Madhusudan Feb 25 '14 at 12:04

1 Answers1

5

WordNet is known to be very fine grained and it sometimes makes distinctions between very subtly different senses that you and I might think are the same. There have been attempts to make WordNet coarser, google "Automatic of a coarse grained WordNet". I am not sure if the results of that paper are available for download, but you can always contact the authors.

Alternatively, change your working definition of polysemy. If the most frequent sense of a word accounts for more than 80% of its uses in a large corpus, then the word is not polysemous. You will have to obtain frequency counts for the different senses of as many words as possible. Start your research here and here.

Community
  • 1
  • 1
mbatchkarov
  • 15,487
  • 9
  • 60
  • 79