import re
input_text = "Había... ; Martín Zázza no se trata de un nombre" #example 1
input_text = "asasjhsah; Carolina María Sol no se trataría de un nombre" #example 2
input_text = "Isaías no se trataría de un nombre" #example 3
word = ""
name_capture_pattern_01 = r"([A-Z][a-z]+(?:\s*[A-Z][a-z]+)*)"
regex_pattern_01 = r"(?:^|[.;,]\s*)" + name_capture_pattern_01 + r"\s*(?i:no)\s*(?i:se\s*tratar[íi]a\s*de\s*un\s*nombre|se\s*trata\s*de\s*un\s*nombre|(?:ser[íi]a|es)\s*un\s*nombre)"
n1 = re.search(regex_pattern_01, input_text)
if n1 and word == "":
word, = n1.groups()
word = word.strip()
print(repr(word)) #print the captured substring
How to add these symbols, where the accented vowel letters are included and the letter u with diaeresis, [áéíóúüñ]
to the search pattern defined by the pattern [A-Z][a-z]+
In this way, the search pattern will be able to capture strings that start with a capital letter, and have spaces in between, but that can include those additional symbols. In other words, the objective is to add those symbols without modifying the behavior of the capture group already defined with this regex.
This is the part of the capture pattern that I need to expand, name_capture_pattern_01 = r"([A-Z][a-z]+(?:\s*[A-Z][a-z]+)*)"
so that it can accept substring that include these symbols [áéíóúüñ]
. The idea is that, if possible, try to add that implementation in that part of the regex without modifying the rest of the regex.
And the outputs should be the substring(names) obtained by the capture group already amplified:
Martín Zázza
Carolina María Sol
Isaías