I have a vector of character
. For each of these elements I am 100% sure there is a repetition that is always located at the start of the text.
A simplified example of a repeated sentence:
Hello. Hello. How are you?
Wait I aim for is just Hello. How are you?
Another example:
Hello I am Joe. Hello I am Joe. How are you?
In this case I would aim for: Hello I am Joe. How are you?
Another example of repetition:
Hello I a Hello I am Joe. How are you?
Another example of repetition:
Hello I am Jo Hello I am Joe. How are you?
In these cases, the desired output is still: Hello I am Joe. How are you?
Another example is the following:
Hello I am J Hello I am Joe. Joe is indeed my name
In this case, the desired output is:
Hello I am Joe. Joe is indeed my name
Notice that all the repetition happens before the desired output not in the middle, not in the end.
In my data I am sure that each text is at least of 440 characters and that this repeated text at the beginning is of random length, on average of 220 characters.