Add letters with diacritics, and u with umlauts, to a pattern that captures strings tolerant of the rest of the letters

Question

import re

input_text = "Había... ; Martín Zázza no se trata de un nombre" #example 1
input_text = "asasjhsah; Carolina María Sol no se trataría de un nombre" #example 2
input_text = "Isaías no se trataría de un nombre" #example 3

word = ""

name_capture_pattern_01 = r"([A-Z][a-z]+(?:\s*[A-Z][a-z]+)*)"

regex_pattern_01 = r"(?:^|[.;,]\s*)" + name_capture_pattern_01 + r"\s*(?i:no)\s*(?i:se\s*tratar[íi]a\s*de\s*un\s*nombre|se\s*trata\s*de\s*un\s*nombre|(?:ser[íi]a|es)\s*un\s*nombre)"

n1 = re.search(regex_pattern_01, input_text)
if n1 and word == "":
    word, = n1.groups()
    word = word.strip()

print(repr(word)) #print the captured substring

How to add these symbols, where the accented vowel letters are included and the letter u with diaeresis, [áéíóúüñ] to the search pattern defined by the pattern [A-Z][a-z]+

In this way, the search pattern will be able to capture strings that start with a capital letter, and have spaces in between, but that can include those additional symbols. In other words, the objective is to add those symbols without modifying the behavior of the capture group already defined with this regex.

This is the part of the capture pattern that I need to expand, name_capture_pattern_01 = r"([A-Z][a-z]+(?:\s*[A-Z][a-z]+)*)" so that it can accept substring that include these symbols [áéíóúüñ]. The idea is that, if possible, try to add that implementation in that part of the regex without modifying the rest of the regex.

And the outputs should be the substring(names) obtained by the capture group already amplified:

Martín Zázza
Carolina María Sol
Isaías

Maybe I'm missing something? Just add them to the character set that you want them to be a part of. `[a-záéíóúüñ]` — CAustin, Jan 16 '23 at 05:42
So I could change this `r"([A-Z][a-z]+(?:\s*[A-Z][a-z]+)*)"` with this `r"([A-ZÁÉÍÓÚÜÑ][a-záéíóúüñ]+(?:\s*[A-ZÁÉÍÓÚÜÑ][a-záéíóúüñ]+)*)"` , adding these extra characters but not altering the general operation of the capturing group? — Matt095, Jan 16 '23 at 05:50

Add letters with diacritics, and u with umlauts, to a pattern that captures strings tolerant of the rest of the letters

0 Answers0