0

Firstly I must admit that I am a newbie to Python or R.

Here I am trying to create a file with the list of bi-grams / 2-grams along with their POS tags (NN, VB, etc...). This is used to easily identify meaningful bi-grams and their POS tag combinations.

For example: the bigram - 'Gross' 'Profit' has the POS tag combination of JJ & NN. But the bigram - 'quarter' 'of' has the POS tag combination of NN & IN. With this I can find meaningful POS combinations. It may not be accurate. That is fine. Just want to research with it.

For Reference please check the section "2-gram Results" in this page.My requirement is something like that. But it was done in R. So it was not useful to me.

As I have come across in Python, POS Tagging and creation of bi-grams can be done using NLTK or TextBlob package. But I am unable to find a logic to assign POS tags for the bi-grams generated in Python. Please see below for the code and relevant output.

import nltk
from textblob import TextBlob
from nltk import word_tokenize
from nltk import bigrams

################# Code snippet using TextBlob Package #######################
text1 = """This is an example for using TextBlob Package"""
blobs = TextBlob(text1)             ### Converting str to textblob object
blob_tags = blobs.tags              ### Assigning POS tags to the word blobs
print(blob_tags)
blob_bigrams = blobs.ngrams(n=2)    ### Creating bi-grams from word blobs
print(blob_bigrams)

################# Code snippet using NLTK Package #######################
text2 = """This is an example for using NLTK Package"""
tokens = word_tokenize(text2)       ### Converting str object to List object                        
nltk_tags = nltk.pos_tag(tokens)    ### Assigning POS tags to the word tokens
print(nltk_tags)
nltk_bigrams = bigrams(tokens)      ### Creating bi-grams from word tokens
print(list(nltk_bigrams))

Any help is much appreciated. Thanks in advance.

JKC
  • 2,498
  • 6
  • 30
  • 56
  • See https://stackoverflow.com/questions/14732465/nltk-tagging-spanish-words-using-a-corpus and https://stackoverflow.com/questions/40212895/nltk-tag-dutch-sentence – alvas Jul 21 '17 at 16:40
  • Thank you so much @alvas. But can you please explain the logic with sample output as the output present in "2-gram results" section in this R link - https://datascience.stackexchange.com/questions/5316/general-approach-to-extract-key-text-from-sentence-nlp – JKC Jul 25 '17 at 09:52

0 Answers0