0

I would like to know if there is a way to analyze nouns in a list. For example, if there is an algorithm that discern different categories, so like if the noun is part of the category "animal", "plants", "nature" and so on. I thought it was possible to achieve this result with Wordnet, but, if I am not wrong, all the nouns in WordNet are categorized as "entity". Here is a script of my WordNet analysis:

lemmas = ['dog', 'cat', 'garden', 'ocean', 'death', 'joy']

hypernyms = []
for i in lemmas:
    dog = wn.synsets(i)[0]
    temp_list = []
    hypernyms_list = ([lemma.name() for synset in dog.root_hypernyms() for lemma in synset.lemmas()])
    temp_list.append(hypernyms_list)
    flat = list(set([item for sublist in temp_list for item in sublist]))
    hypernyms.append(flat)
hypernyms

And the result is: [['entity'], ['entity'], ['entity'], ['entity'], ['entity'], ['entity']].

Can anybody suggest me some techniques to retrieve the category the names belong to, if there is anything available? Thanks in advance.

user9355680
  • 87
  • 1
  • 14

1 Answers1

0

One approach I can suggest is using Google's NLP API. This API have feature of identifying Part of Speech as part of Syntax Analysis. Please refer to documentation here - Google's NLP API - Syntax Analysis

Another option is Stanford's NLP API. Here are reference docs - Stanford's NLP API

aks
  • 458
  • 1
  • 5
  • 16
  • 1
    Thank you for your comment, these two link are very interesting, although it is not exactly what I am looking for. I found a way to achieve my objective, I think! I calculated the path_hypernyms() (WordNet) between my target word and the higher level in the synset, which for all the nouns is 'entity', then I extract the second, third and fourth highest hypernyms and calculate some statistics in my list! – user9355680 Aug 04 '18 at 08:39