1

I need to split a string into a list containing x number of words, but repeating the last x-1 words.

line = "Lorem ipsum dolor sit amet consectetur."

if x = 2, the output should be:

['Lorem ipsum', 'ipsum dolor', 'dolor sit', 'sit amet', 'amet consectetur']

if x = 3, the output should be:

['Lorem ipsum dolor', 'ipsum dolor sit', 'dolor sit amet', 'sit amet consectetur']

As per Split string into list of two words, repeating the last word, the following code successfully splits the string into 2-word pairs:

words = line.split()
print(list(map(' '.join, zip(words[:-1], words[1:]))))

However instead of hard-coding the number of words as 2, I would like to specify the number of words x, for example:

number_of_words = x
def generate_list(x):

I have tried playing around with the integers in print(list(map(' '.join, zip(words[:-1], words[1:])))), however the integers only seem to affect the ordering of words, rather than the number of words.

I imagine I could write separate functions to handle 2-word, 3-word, 4-word scenarios, however ideally I'd like to have one function which handles any x number of words.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Alan
  • 509
  • 4
  • 15

3 Answers3

5

For a rolling window of 3 words, you can simply pass one more sliced list to zip:

list(map(' '.join, zip(words, words[1:], words[2:])))

You can therefore use a generator expression to generalize the above expression:

def rolling_window(words, number_of_words):
    return list(map(' '.join, zip(*(words[i:] for i in range(number_of_words)))))

so that:

rolling_window('Lorem ipsum dolor sit amet consectetur'.split(), 3)

returns:

['Lorem ipsum dolor', 'ipsum dolor sit', 'dolor sit amet', 'sit amet consectetur']
blhsing
  • 91,368
  • 6
  • 71
  • 106
2

You need to remember the last x - 1 elements of your iteration to do this properly.

def combinate(sentence, x):
    words = sentence.split()
    return [' '.join(words[i:i+x]) for i in range(len(words) - x + 1)]

IDEOne link

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

You can try this way.

def generate_list(x):
    line = "Lorem ipsum dolor sit amet consectetur."
    words = line.split()
    final_list = []
    for i in range(len(words) - x + 1):
        final_list.append(' '.join(words[i:i + x]))
    return final_list

number_of_words = 3
print(generate_list(number_of_words ))

Output : ['Lorem ipsum dolor', 'ipsum dolor sit', 'dolor sit amet', 'sit amet consectetur.']

R.A.Munna
  • 1,699
  • 1
  • 15
  • 29