import re
input_text = "del 2065 de 42 52 de 200 de 2222 25 de 25 del 26. o del 8" #example input
num_pattern = r"(\d{1,2})"
identification_regex = r"(?:del|de[\s|]*el|de|)[\s|]*" + num_pattern
input_text = re.sub(identification_regex, "AA", input_text)
print(repr(input_text)) # --> output
This is the wrong output that I am getting if I use this search regex pattern for the numeric values identification in the input string
'AAAA AAAA AAAA AAAAAA AA AA. o AA'
Since this is a simplification of a program, it has simplified the logic to just replace by "AA"
, and this is the output we should get. In this output you can be noted that if there are more than 2 immediately consecutive numerical values, the replacement is not performed.
"del 2065 AA AA de 200 de 2222 AA AA AA. o AA"
The problem with this search pattern r"(\d{1,2})"
is that while it succeeds in capturing 2 numeric digits correctly, the problem is that it captures even when there are numbers of 4 or more digits immediately following each other.
What kind of additional constraint do I need to put on the character search pattern for the match and replace algorithm to work correctly?