0

I am new to nltk and I am using python. I am taking string as input to Bigrams. When I am showing the items of this. I am getting each character as a word.

import nltk   
string = "Batman Superman"   
bigram = nltk.bigrams(string) 
print bigram.item() 
[('B','a'),('a','t'),('t','m'),('m','a'),('a','n'),('n',' '),(' ','S'),
('S','u'),('u','p'),('p','e'),('e','r'),('r','m')('m','a'),('a','n')] 

But I want output as [('Batman','Superman')] please tell me how I get this output only taking string as input to Bigrams function but not taking list as input.

Vini.g.fer
  • 11,639
  • 16
  • 61
  • 90
Python
  • 3
  • 5
  • 2
    You have to tokenize your string first. Refer to [here](http://stackoverflow.com/questions/24347029/python-nltk-bigrams-trigrams-fourgrams) – Scratch'N'Purr Jun 07 '16 at 19:22

1 Answers1

1

Ok, so what is happening here is that the bigrams function is expecting a tokenized version of you corpus, that is a list of words in order.

When you pass it a string, nltk is doing its best and converts that string into a list of chars, and then produces the bigrams of that list, which happens to be pairs of chars.

If you want to get word-chunk bigrams, you will need to tokenize your input sentence like so:

>>> string = "Batman Superman"
>>> tokenized = string.split(" ")
>>> list(nltk.bigrams(tokenized))
[('Batman', 'Superman')]
evan.oman
  • 5,922
  • 22
  • 43
  • But i want [('Batman','Superman')] as output even though taking input as string.. Is this possible ?? – Python Jun 07 '16 at 19:48
  • No, because the `bigrams` function expects a list of tokens to work with. If you want it to be shorter, you could do `nltk.bigrams(string.split(" "))`. – evan.oman Jun 07 '16 at 19:51
  • A tokenized version of the string will represent the same data, just in the proper format for `nltk` to work with. – evan.oman Jun 07 '16 at 19:52
  • if i convert string into tokens and passes as input to bigrams function. then it will club the every possible pair of consecutive words(tokens). If i want to find the meaningful bigram how i will get meaningful bigram. – Python Jun 07 '16 at 19:58
  • Please define "meaningful bigram." If you want to ask about semantic parsing, that should be a separate question. – evan.oman Jun 07 '16 at 19:59
  • No problem! If this answer has been helpful, please [upvote/accept it](http://stackoverflow.com/help/someone-answers) – evan.oman Jun 07 '16 at 20:05