I have this string which is a mix between a title and a regular sentence (there is no separator separating the two).
text = "Read more: Indonesia to Get Moderna Vaccines Before the pandemic began, a lot of people were...."
The title actually ends at the word Vaccines
, the Before the pandemic
is another sentence completely separate from the title.
How do I remove the substring until the word vaccines? My idea was to remove all words from the words "Read more:" to all the words after that that start with capital until before one word (before
). But I don't know what to do if it meets with conjunction or preposition that doesn't need to be capitalized in a title, like the word the
.
I know there is a function title()
to convert a string into a title format in Python, but is there any function that can detect if a substring is a title?
I have tried the following using regular expression.
import re
text = "Read more: Indonesia to Get Moderna Vaccines Before the pandemic began, a lot of people were...."
res = re.sub(r"\s*[A-Z]\s*", " ", text)
res
But it just removed all words started with capital letters instead.