First, let me just apologize for how unhelpful the regular expressions documentation for python 3 is. All the info to answer this question is can technically be found here, but you already need to know a bit about how re
works to make sense of it. That being said, hopefully this will give you a leg up:
A simple answer
Here's some code you could try:
import re
data = ["Fred is Deputy Manager. He is working for MNC.", "Rita is another employee in AC Corp."]
matcher = re.compile("(?<![.])[ ][A-Z][A-z]*")
print([matcher.sub("",d) for d in data])
# prints: ['Fred is. He is working for.', 'Rita is another employee in.']
Basically, this compiles a regular expression which will match capital words not following a period:
(?<![.])
-> don't match if preceded by a period
[ ][A-Z][A-z]*
-> any capitalized word (which has a leading space, which makes sure if never matches the first word in the string)
Then, it applies that regular expression to each string in your list and replaces the matches with the empty string: ""
Some Limitations
If your strings ever have double spaces or other whitespace characters (like tabs or carriage returns) that will break this. You can fix that by instead using:
matcher = re.compile("(?<![.])\s+[A-Z][A-z]*")
where \s+
will match one or more whitespace characters
Also, if your strings ever lead off with a space, that will also break this. You can fix that by using:
print([matcher.sub("",d.strip(" ")) for d in data])
to remove the leading or trailing whitespace characters from your string.