I have a function based on nltk.pos_tag that filters out collocations from text for only Adjective (JJ) and Noun (NN) together.
f1=u'this is my random text'
tokens = word_tokenize(f1)
bigramFinder = nltk.collocations.BigramCollocationFinder.from_words(tokens)
bigram_freq = bigramFinder.ngram_fd.items()
bigramFreqTable = pd.DataFrame(list(bigram_freq), columns=['bigram','freq']).sort_values(by='freq', ascending=False)
print(bigramFreqTable)
def rightTypes(ngram):
first_type = ('JJ')
second_type = ('NN')
tags = nltk.pos_tag(ngram)
if tags[0][1] in first_type and tags[1][1] in second_type:
return True
else:
return False
filtered_bi = bigramFreqTable[bigramFreqTable.bigram.map(lambda x: rightTypes(x))]
print(filtered_bi)
I would like to use spacy
method instead of nltk.pos_tag
. Below is example code from spacy
documentation.
import spacy
from spacy.lang.en.examples import sentences
nlp = spacy.load('en_core_web_sm')
doc = nlp(sentences[0])
print(doc.text)
for token in doc:
print(token.text, token.pos_)
I tried different solutions, for example tags=[(X.text, X.tag_) for Y in nlp(ngram).ents for X in Y]
but have errors... Could you please help to use spacy instead of nltk?