0
import re, datetime

input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"

identify_dates_regex_00 = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"
identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))"

restructuring_structure_00 = "(" + r"\g<year>_-_\g<month>_-_\g<startDay>" + r" \g<hh>:\g<mm> \g<am_or_pm>" + ")"

input_text = re.sub("\(" + identify_dates_regex_00 + " " + identify_time_regex + "\)", restructuring_structure_00, input_text)

print(repr(input_text)) # --> output

This is the wrong output that I get:

'hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)'

This is the correct output, without the extra parentheses, that I get:

'hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)'

I need it to remove the unnecessary parentheses if they have in the middle the structure of year_-_month_-_day hour:minute am or pm, that in regex using capture groups can be written like this "(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})" identify_time_regex = r"(?P<hh>\d{2}):(?P<mm>\d{2})[\s|]*(?P<am_or_pm>(?:am|pm))" or with and without determining capturing groups, it could be written with simple regex (although we would lose the possibility of capturing the data) "\d*_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m"

Matt095
  • 857
  • 3
  • 9
  • 1
    And what about `((44_-_44))`? Do you want to remove extra parentheses even here? – PieCot Nov 22 '22 at 07:22
  • @PieCot that example, it should NOT be replaced since `"44_44"` does not respond to the detection pattern `"(?P\d*)_-_(?P\d{2})_-_(?P\d{2})" identify_time_regex = r"(?P\d{2}):(?P\d{2})[\s|]*(?P(?:am|pm))"`, in order to have been a replacement it should have been something like this `"((1920_-_01_-_15 15:30 pm))"`. In other words, you should only make the replacement (remove the remaining parentheses) when there is a date with a time indicated in the middle of this `"(("` `"))"` – Matt095 Nov 22 '22 at 07:30
  • Ok, then I think you should fix the example of the correct output in the question – PieCot Nov 22 '22 at 07:31
  • since **to be correct** the date and time should be enclosed only within a single pair of parentheses, like this `"(1920_-_01_-_15 15:30 pm)"` and not something like this `"((1920_-_01_-_15 15:30 pm))"` or this `"(((((1920_-_01_-_15 15:30 pm)))))"` – Matt095 Nov 22 '22 at 07:32
  • @PieCot You are right, I have corrected the example there. sorry i typed it wrong – Matt095 Nov 22 '22 at 07:34

1 Answers1

1

You can use a single capture group to capture the date and time format between parenthesis, and then remove any surrounding parenthesis.

To do the replacement, you don't need the named capture groups.

In the replacement use capture group 1.

\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*

Regex demo

Example code:

import re

input_text = "hhhh ((44_-_44)) ggj ((2022_-_02_-_18 20:00 pm)) ((((2022_-_02_-_18 20:00 pm))) (2022_-_02_-_18 00:00 am)"
pattern = r"\(*(\(\d{4}_-_\d{2}_-_\d{2} \d{2}:\d{2}[\s|]*[ap]m\))\)*"
print(re.sub(pattern, r"\1", input_text))

Output

hhhh ((44_-_44)) ggj (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 20:00 pm) (2022_-_02_-_18 00:00 am)
The fourth bird
  • 154,723
  • 16
  • 55
  • 70