How to keep only the noun words in a wordlist? python NLTK

Question

I have a wordlist, which consists many subjects. The subjects were auto extracted from sentences. I would like to keep only the noun from the subjects. As u can see some of the subjects have adj which i want to delete it.

wordlist=['country','all','middle','various drinks','few people','its reputation','German Embassy','many elections']
returnlist=[]
for word in wordlist:
    x=wn.synsets(word)
    for syn in x:
        if syn.pos() == 'n':
            returnlist.append(word)
            break
print returnlist

the results of above is :

['country','it',  'middle']

However, I want to get the result should be look like this

   wordlist=['country','it', 'middle','drinks','people','reputation','German Embassy','elections']

How to do that?

not really. as long as i can get the idea result. any method is acceptable — bob90937, Oct 21 '16 at 02:59
"German Embassy" is a noun phrase, not a noun. If that's what you want, look for NP extraction tools. — tripleee, Oct 21 '16 at 03:58
@HishamKaram "The middle" is a noun, although the predominant meaning is of course adjectival. — tripleee, Oct 21 '16 at 04:07

Hisham Karam · Accepted Answer · 2016-10-21T04:05:17.360

3

First your list is a result of not well tokenized text so i tokenized them again then search pos of all words to find nouns which pos contains NN :

>>> text=' '.join(wordlist).lower()
>>> tokens = nltk.word_tokenize(text)
>>> tags = nltk.pos_tag(tokens)
>>> nouns = [word for word,pos in tags if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')
]
>>> nouns
['country', 'drinks', 'people', 'Embassy', 'elections']

edited Oct 21 '16 at 04:05

answered Oct 21 '16 at 03:18

Hisham Karam

1,288
17
28

you return adjectives – Loïc Oct 21 '16 at 03:19

score 0 · Answer 2 · answered Oct 21 '16 at 03:14

0

adjectives = ['many', 'any', 'few', 'some', 'various'] # ...
wordlist = ['country','all','middle','various drinks','few people','its reputation','German Embassy','many elections']
returnlist = []
for word in wordlist:
    for adj in adjectives:
        word = word.lower().replace(adj, '').strip()
    returnlist.append(word)
print(returnlist)

answered Oct 21 '16 at 03:14

Loïc

11,804
1
31
49

you return adjectives and pronoun `['country', 'all', 'middle', 'drinks', 'people', 'its reputation', 'german embassy', 'elections']` – Hisham Karam Oct 21 '16 at 03:35

How to keep only the noun words in a wordlist? python NLTK

2 Answers2