import regex, datetime
input_text = "Alrededor de las 00:16 am del 2022_-_02_-_11 , quizas cerca de las 23:16 pm 2022_-_01_-_15" #example 1
input_text = "Alrededor de las 2022_-_02_-_18 00:16 am , quizas cerca de las 2022_-_12_-_01 a las 23:16 pm" #example 2
input_text = "Alrededor de las 00:16 am , quizas cerca del 2022_-_02_-_18 llega el avion" #example 3
In the example 3, the algorithm need add:
- If the date is indicated and not the time, assume that the time is
"00:00 am"
(exactly would represent the beginning of that day) - If the time is indicated but not the date, assume that it is the current date
datetime.datetime.today().strftime('%Y-%m-%d')
For example, in the example 3 : "Alrededor de las 2022_-_11_-_09 00:16 am , quizas cerca del 2022_-_02_-_18 00:00 am llega el avion" #example 3
#Regex that I need for the date (year_-_month_-_day)
identify_dates_regex = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"
#Regex that I need for the time (hours:minutes)
identify_time_regex = r"(?P<hh>\d{2})_-_(?P<mm>\d{2})[\s|]*(?:am|pm)"
identify_regex_00 = identify_dates_regex + r"[\s|]*(?:por[\s|]*las|por[\s|]*la|a[\s|]*eso[\s|]*de[\s|]*las|a[\s|]*eso[\s|]*de[\s|]*la|a[\s|]*eso[\s|]*de|a[\s|]*las|a[\s|]*la|)[\s|]*" + identify_time_regex
identify_regex_01 = identify_time_regex + r"[\s|]*(?:del|de[\s|]*el|de|)[\s|]*" + identify_dates_regex
After defining the regex patterns that I will use as identifiers, I am going to make the replacements. In order to implement these conditional replacements we use the method re.sub(pattern, repl, string, count=0, flags=0)
And this is where the problem with the question is, and it is that I do not know what to put in the parameter of the new element repl
so that the replacements remain as the ouputs.
input_text = re.sub(identify_regex_00, , input_text) #replacement when date is before time
input_text = re.sub(identify_regex_01, , input_text) #replacement when time is before date (I will need rotate that to get the datetime order)
input_text = re.sub(identify_dates_regex, , input_text) #replacement when only the date is present
#for this replacement I will have to determine the current date to concatenate it in the replacement
today_date = datetime.datetime.today().strftime('%Y-%m-%d')
input_text = re.sub(identify_time_regex, , input_text) #replacement when only the time is present
The goal is that all dates and times are in the similar order of datetime YY_-_mm_-_dd hh:mm am or pm
, and that they are in parentheses (
)
to indicate that both the time and the date belong to the same temporal moment.
Outputs that I need in each case:
"Alrededor (2022_-_02_-_11 00:16 am), quizas cerca (2022_-_01_-_15 23:16 pm)" #for the example 1
"Alrededor (2022_-_02_-_18 00:16 am) , quizas cerca (2022_-_12_-_01 23:16 pm)" #for the example 2
"Alrededor (2022_-_11_-_09 00:16 am) , quizas cerca (2022_-_02_-_18 00:00 am) llega el avion" #for the example 3
How should you use this information that I extract with the capture groups of the regex to rearrange them and/or concatenate content to achieve these results?