I have a list of strings and I would like to untokenize some specific strings. Imagine having the following list with strings and I would like to join the words "my" and "apple" only if they are in respectively order. I was thinking to use the detokenize
function from this Python Untokenize a sentence question. Here is some reproducible code:
target = "my apple"
words = ['this', 'is', 'my', 'apple', 'and', 'this', 'is', 'not', 'your', 'apple']
Using the detokenizer:
from nltk.tokenize.treebank import TreebankWordDetokenizer
TreebankWordDetokenizer().detokenize(['my', 'apple'])
'my apple'
But I am not sure how to use this in a list with multiple strings and with specifying a target. Here is the desired output:
target_output = ['this', 'is', 'my apple', 'and', 'this', 'is', 'not', 'your', 'apple']
['this', 'is', 'my apple', 'and', 'this', 'is', 'not', 'your', 'apple']
So I was wondering if anyone knows how to detokenize some specific words only if they are next to each other in a list?