0
import regex, datetime

input_text = "Alrededor de las 00:16 am del 2022_-_02_-_11 , quizas cerca de las 23:16 pm 2022_-_01_-_15" #example 1

input_text = "Alrededor de las 2022_-_02_-_18 00:16 am , quizas cerca de las 2022_-_12_-_01 a las 23:16 pm" #example 2

input_text = "Alrededor de las 00:16 am , quizas cerca del 2022_-_02_-_18 llega el avion" #example 3

In the example 3, the algorithm need add:

  • If the date is indicated and not the time, assume that the time is "00:00 am" (exactly would represent the beginning of that day)
  • If the time is indicated but not the date, assume that it is the current date datetime.datetime.today().strftime('%Y-%m-%d')

For example, in the example 3 : "Alrededor de las 2022_-_11_-_09 00:16 am , quizas cerca del 2022_-_02_-_18 00:00 am llega el avion" #example 3

#Regex that I need for the date (year_-_month_-_day)
identify_dates_regex = r"(?P<year>\d*)_-_(?P<month>\d{2})_-_(?P<startDay>\d{2})"

#Regex that I need for the time (hours:minutes)
identify_time_regex = r"(?P<hh>\d{2})_-_(?P<mm>\d{2})[\s|]*(?:am|pm)"

identify_regex_00 = identify_dates_regex + r"[\s|]*(?:por[\s|]*las|por[\s|]*la|a[\s|]*eso[\s|]*de[\s|]*las|a[\s|]*eso[\s|]*de[\s|]*la|a[\s|]*eso[\s|]*de|a[\s|]*las|a[\s|]*la|)[\s|]*" + identify_time_regex

identify_regex_01 = identify_time_regex + r"[\s|]*(?:del|de[\s|]*el|de|)[\s|]*" + identify_dates_regex

After defining the regex patterns that I will use as identifiers, I am going to make the replacements. In order to implement these conditional replacements we use the method re.sub(pattern, repl, string, count=0, flags=0)

And this is where the problem with the question is, and it is that I do not know what to put in the parameter of the new element repl so that the replacements remain as the ouputs.

input_text = re.sub(identify_regex_00, , input_text) #replacement when date is before time

input_text = re.sub(identify_regex_01, , input_text) #replacement when time is before date (I will need rotate that to get the datetime order)

input_text = re.sub(identify_dates_regex, , input_text) #replacement when only the date is present

#for this replacement I will have to determine the current date to concatenate it in the replacement
today_date = datetime.datetime.today().strftime('%Y-%m-%d')
input_text = re.sub(identify_time_regex, , input_text) #replacement when only the time is present

The goal is that all dates and times are in the similar order of datetime YY_-_mm_-_dd hh:mm am or pm, and that they are in parentheses ( ) to indicate that both the time and the date belong to the same temporal moment. Outputs that I need in each case:

"Alrededor (2022_-_02_-_11 00:16 am), quizas cerca (2022_-_01_-_15 23:16 pm)" #for the example 1

"Alrededor (2022_-_02_-_18 00:16 am) , quizas cerca (2022_-_12_-_01 23:16 pm)" #for the example 2

"Alrededor (2022_-_11_-_09 00:16 am) , quizas cerca (2022_-_02_-_18 00:00 am) llega el avion" #for the example 3

How should you use this information that I extract with the capture groups of the regex to rearrange them and/or concatenate content to achieve these results?

Matt095
  • 857
  • 3
  • 9

0 Answers0