0
import re, datetime

#example
input_text = "Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)"

#identification pattern
input_date_structure = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"
identify_only_date_regex_00 = input_date_structure + r"[\s|]*" + r"(\d{2}:\d{2}[\s|]*[ap]m)?" #to identify if there is a time after  the date
identify_only_date_regex_01 = r"(\d{2}:\d{2}[\s|]*[ap]m)?" + r"[\s|]*" + input_date_structure #to identify if there is a time before the date

#replacement structure
date_restructuring_structure = r"\g<year>_-_\g<month>_-_\g<startDay>"
restructuring_only_date = lambda x: x.group() if x.group(1) else " (" + date_restructuring_structure + " 00:00 am)"

#do the replace with re.sub() method and the regex patterns instructions
input_text = re.sub(identify_only_date_regex_00, restructuring_only_date, input_text)
input_text = re.sub(identify_only_date_regex_01, restructuring_only_date, input_text)

#print output
print(repr(input_text)) # --> output

The wrong output that I get:

'Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del (\\g<year>_-_\\g<month>_-_\\g<startDay> 00:00 am) llega el avion, pero no a las ( (\\g<year>_-_\\g<month>_-_\\g<startDay> 00:00 am) 00:16 am), de esos hay dos (22)'

The correct output, where ONLY those dates that were not preceded or followed by a time hh:mm am or pm indication, like this r"(\d{2}:\d{2}[\s|]*[ap]m)?", were modified:

"Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del (2022_-_02_-_18 00:00 am) llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)"

In this example, we can see the 3 possible cases, where the date is preceded by the time, then one where the date is only, and finally there is a date followed by the time. And the only case where you should do the replacement is when the date is alone (without the time hh:mm am or pm indication)

Matt095
  • 857
  • 3
  • 9
  • 1
    Because it is just a string inside a lambda expression, so no expansion takes place. The correct syntax has already been provided in the previous answer. – Wiktor Stribiżew Nov 20 '22 at 21:36
  • The problem is how to do if it is a compound string, indicate that it is with **r**, since it needs to be a string that interprets the labels of the replacements and at the same time it needs to be a concatenation of more than one string – Matt095 Nov 20 '22 at 21:39
  • 1
    There is no "compound string" notion. Do this all in the lambda expression. Maybe you need `x.expand(date_restructuring_structure)` though. See [this Python demo](https://ideone.com/9lo0QB). – Wiktor Stribiżew Nov 20 '22 at 21:59

0 Answers0