So, I was learning nlp with nltk3 and while practicing on one of the examples I got stuck while counting the named entities in a sentence. Apparently, nltk has been updated and .node has been removed from tree structure. Here is my code:
import sys
f=open('nyt.txt','r')
news_content=f.read()
import nltk
results=[]
for sent_no,sent in enumerate(nltk.sent_tokenize(news_content)):
tokens=nltk.word_tokenize(sent)
no_of_tokens=len(tokens)
tagged=nltk.pos_tag(tokens)
nouns=len([word for word,pos in tagged if pos in ["NN","NNP"]])
ners=nltk.ne_chunk(tagged,binary=True)
no_of_ners=len([chunk for chunk in ners if hasattr(chunk,'node')])
score=(nouns+no_of_ners)/float(no_of_tokens)
results.append((sent_no,no_of_tokens,no_of_ners,nouns,score,sent))
results.sort(key=lambda x:x[4])
print(results[5])
On executing I get error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\ssisharm\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\ssisharm\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/ssisharm/Documents/Python Scripts/news_summary.py", line 19, in <module>
no_of_ners=len([chunk for chunk in ners if hasattr(chunk,'node')])
File "C:/Users/ssisharm/Documents/Python Scripts/news_summary.py", line 19, in <listcomp>
no_of_ners=len([chunk for chunk in ners if hasattr(chunk,'node')])
File "C:\Users\ssisharm\Anaconda3\lib\site-packages\nltk\tree.py", line 202, in _get_node
raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.
I need to access the named entities and count them. Could someone please help?