0
import re

input_text = "del 2065 de 42 52 de 200 de 2222 25 de 25 del 26. o del 8"  #example input

num_pattern = r"(\d{1,2})"
identification_regex = r"(?:del|de[\s|]*el|de|)[\s|]*" + num_pattern

input_text = re.sub(identification_regex, "AA", input_text)

print(repr(input_text)) # --> output

This is the wrong output that I am getting if I use this search regex pattern for the numeric values identification in the input string

'AAAA AAAA AAAA AAAAAA AA AA. o AA'

Since this is a simplification of a program, it has simplified the logic to just replace by "AA", and this is the output we should get. In this output you can be noted that if there are more than 2 immediately consecutive numerical values, the replacement is not performed.

"del 2065 AA AA de 200 de 2222 AA AA AA. o AA"

The problem with this search pattern r"(\d{1,2})" is that while it succeeds in capturing 2 numeric digits correctly, the problem is that it captures even when there are numbers of 4 or more digits immediately following each other.

What kind of additional constraint do I need to put on the character search pattern for the match and replace algorithm to work correctly?

Matt095
  • 857
  • 3
  • 9
  • 1
    Don't use `[\s|]*` instead of `\s*`. This has been pointed out before. And to limit the numbers, you need a lookahead, `identification_regex = r"(?:del|de\s*el|de|)\s*" + num_pattern + r'(?!\d)'`. Or, just use a word boundary. – Wiktor Stribiżew Dec 20 '22 at 21:56
  • What is this part of the regex for? `(?:del|de[\s|]*el|de|)[\s|]*` The desired output you listed still has `del` and `de` in it, which is found and replaced by the regex above. Also, if the input has the same structure as the question, why not just search for `\s\d{1,2}\s`? – dc-ddfe Dec 20 '22 at 22:00
  • 1
    this single regex should give you the result you need ```(del?\s+|)\b\d{1,2}\b```, ```re.sub(r"(del?\s+|)\b\d{1,2}\b","AA",input_text)``` gives the result ```'del 2065 AA AA de 200 de 2222 AA AA AA. o AA'``` – SR3142 Dec 20 '22 at 22:23
  • @SR3142 Really thanks for the help!! it worked perfect for me. The only question is if you use \b to set the constraint? – Matt095 Dec 20 '22 at 23:40
  • @WiktorStribiżew Really thanks, for the help, It helped me place the pattern r'(?!\d)' to establish the restriction – Matt095 Dec 20 '22 at 23:42
  • 1
    the \b restricts the match to groups of 1 or 2 digits that have a boundary so it will not match two digits in 2065, but will also not match the 21 in foo21bar. https://regex101.com/ is very useful for testing and playing with regular expressions – SR3142 Dec 21 '22 at 09:47

0 Answers0