2

My Input is "I like to play basketball". And the Output I am looking for is "I like", "like to", "to play", "play basketball". I have used Nltk word tokenize but that gives single tokens only. I have these type of statements in a huge database and this pairwise tokenization is to be run on an entire column.

Saurabh
  • 23
  • 3

2 Answers2

3

You can use list comprehension for that:

>>> a =  "I like to play basketball"
>>> b = a.split()
>>> c = [" ".join([b[i],b[i+1]]) for i in range(len(b)-1)]
>>> c
['I like', 'like to', 'to play', 'play basketball']
assli100
  • 555
  • 3
  • 12
2

You could do it like this:

s = 'I like to play basketball'
t = s.split()
for i in range(len(t)-1):
    print(' '.join(t[i:i+2]))