I'm supposed to normalize time statements in input, converting them to a standard format. The input statements contain an hour, possibly minutes, and a part of day (morning or evening). The part of day can be expressed multiple ways. The hour might be based on a 12-hour clock.
The output must use a 24 hour clock and "am" or "pm " for the time of day. Extra characters (such as spaces) in the time statement should be kept. Minutes shouldn't be added; if the original statement doesn't include minutes, they shouldn't appear in the result.
Sample data
#input examples:
inputs = [
"6 de la manana hdhd", #example 1
"hdhhd 06: de la manana hdhd", #example 2
"hd:00 06 : de la manana hdhd", #example 3
"hdhhd 6 de la manana hdhd", #example 4
"hdhhd 06:00 de la manana hdhd", #example 5
"hdhhd 06 : 18 de la manana hdhd", #example 6
"hdhhd 18 de la manana hdhd", #example 7
"hdhhd 18:18 de la manana hdhd", #example 8
"hdhhd 18 : 00 de la manana hdhd", #example 9
"hdhhd 19 : 19 de la noche hdhd", #example 10
"hdhhd 6 de la noche hdhd", #example 11
]
There are two cases where the hour might need to be changed.
- The input might contain mistakes, where the hour is in the evening but the part-of-day is given as the morning (example 7). In this case, the part-of-day should be changed to match the hour.
- The input might also use a 12 hour clock, where the hour is <= 12 and the part-of-day is in the evening (example 11). In this case, the hour should be changed to match the part-of-day.
This is my code so far, where I have managed to put together the structure of the replacements but I have not yet been able to extract the data that I will need in the process. I have put pseudocode in those parts that are not finished:
import re #library for using regular expressions
am_list = ["manana", "mañana", "mediodia", "medio dia","madrugada","amanecer"]
pm_list = ["atardecer", "tarde", "ocaso", "noche", "anochecer"]
def fix_time(input_text):
is_am_time, is_pm_time = False, False
hour_number_fixed, civil_time_fixed = "", ""
re_pattern_for_am = r"\d{1,2})[\s|:]*(\d{0,2})\s*(?:de la |de el)" + am_list
if (identification condition for am):
#extract with re.group()
hour_number = int() # <--- \d{1,2}
am_or_pm = str() # <--- am_list
re_pattern_for_pm = r"\d{1,2})[\s|:]*(\d{0,2})\s*(?:de la |de el)" + pm_list
if (identification condition for pm):
#extract with re.group()
hour_number = int() # <--- \d{1,2}
am_or_pm = str() # <--- pm_list
if (am_or_pm == one element in am_list):
is_am_time = True
elif (am_or_pm == one element in pm_list):
is_pm_time = True
if (is_am_time == True):
if (hour_number >= 12):
civil_time_fixed = "pm"
else:
civil_time_fixed = "am"
hour_number_fixed = str(hour_number)
elif (is_pm_time == True):
if (hour_number < 12):
hour_number_fixed = str(hour_number + 12 )
civil_time_fixed = "pm"
#replacement process
input_text = input_text.replace(hour_number, hour_number_fixed, 1)
input_text = input_text.replace(am_or_pm, civil_time_fixed, 1)
return input_text
I need the program to decide and correct the schedules, using the data (hour_number
and am_or_pm
) that it must extract from the input_string
with re.group()
. This is what is giving me the most trouble. How can I get the regexes to capture the hour and part of day?
The correct output in each case:
"6 am hdhd" #for the example 1
"hdhhd 06: am hdhd" #for the example 2
"hd:00 06 : am hdhd" #for the example 3
"hdhhd 6 am hdhd" #for the example 4
"hdhhd 06:00 am hdhd" #for the example 5
"hdhhd 06 : 18 am hdhd" #for the example 6
"hdhhd 18 pm hdhd" #for the example 7
"hdhhd 18:18 pm hdhd" #for the example 8
"hdhhd 18 : 00 pm hdhd" #for the example 9
"hdhhd 19 : 19 pm hdhd" #for the example 10
"hdhhd 18 pm hdhd" #for the example 11
How do I do those data extractions with re.group()
(or similar method) in this code?