1

First off, let me introduce you to my problem: for a project I have to classify 8000 questions and put them into 7 categories (constitution, sports, geography, history, science, education and tech). Because the questions are very short SVM's don't make much sense, so I just created a list of words for every category. To improve accuracy I have to expand these lists, so unlabeled strings can be put into categories. On the internet I heard about WordNet to get synonyms of words (which makes sense for me, because I need as many synonyms for my words as possible). But here comes the problem: WordNet shows under

from nltk.corpus import wordnet as wn
for synset in wn.synsets(word):
    for lemma in synset.lemmas():
        print(lemma.name())

all the related words. An example is the word capital: I just mean capital in the sense of the capital city of a country, but WordNet returns the words capital, working, capital letter, upper case, upper-case, majuscule and Capital Washington. Obviously, I don't need the word upper-case in a bag of words for geography. So I ask you if there is any possibility to reduce WordNet to only one meaning or if there is any alternative that I can use.

Sincerely, James

James No
  • 29
  • 3
  • Might [this](https://stackoverflow.com/questions/42038337/what-is-the-connection-or-difference-between-lemma-and-synset-in-wordnet) be relevant? – SwiftsNamesake Aug 19 '17 at 20:48
  • 1
    @SwiftsNamesake yes, thank you so much! I didn't know the difference between the two and searched for lemmas, which of course will give me all related words (not only synonyms). Another think that has helped me was [this link](https://pypi.python.org/pypi/PyDictionary/1.3.8), PyDictionary gives you synonyms from TheSaurus which are pretty accurate. – James No Aug 19 '17 at 21:17

1 Answers1

1

You need to find the synonyms for a specific lemma (canonical dictionary entry; a word with a single definition). I'll simply include the link I posted in the comments, and wish you good luck.

SwiftsNamesake
  • 1,540
  • 2
  • 11
  • 25