Firstly I must admit that I am a newbie to Python or R.
Here I am trying to create a file with the list of bi-grams / 2-grams along with their POS tags (NN, VB, etc...). This is used to easily identify meaningful bi-grams and their POS tag combinations.
For example: the bigram - 'Gross' 'Profit' has the POS tag combination of JJ & NN. But the bigram - 'quarter' 'of' has the POS tag combination of NN & IN. With this I can find meaningful POS combinations. It may not be accurate. That is fine. Just want to research with it.
For Reference please check the section "2-gram Results" in this page.My requirement is something like that. But it was done in R. So it was not useful to me.
As I have come across in Python, POS Tagging and creation of bi-grams can be done using NLTK or TextBlob package. But I am unable to find a logic to assign POS tags for the bi-grams generated in Python. Please see below for the code and relevant output.
import nltk
from textblob import TextBlob
from nltk import word_tokenize
from nltk import bigrams
################# Code snippet using TextBlob Package #######################
text1 = """This is an example for using TextBlob Package"""
blobs = TextBlob(text1) ### Converting str to textblob object
blob_tags = blobs.tags ### Assigning POS tags to the word blobs
print(blob_tags)
blob_bigrams = blobs.ngrams(n=2) ### Creating bi-grams from word blobs
print(blob_bigrams)
################# Code snippet using NLTK Package #######################
text2 = """This is an example for using NLTK Package"""
tokens = word_tokenize(text2) ### Converting str object to List object
nltk_tags = nltk.pos_tag(tokens) ### Assigning POS tags to the word tokens
print(nltk_tags)
nltk_bigrams = bigrams(tokens) ### Creating bi-grams from word tokens
print(list(nltk_bigrams))
Any help is much appreciated. Thanks in advance.