0
import re

input_text = "entre las 15 : hs -- 16:10 "  #Example 1
input_text = "entre las 21 :  -- 22"  #Example 2
input_text = "entre la 1 30 -- 2 "  #Example 3
input_text = "entre la 1 09 h.s. -- 6 : hs."  #Example 4
input_text = "entre la 1:50 -- 6 :"  #Example 5
input_text = "entre la 7 59 -- 23 : "  #Example 6
input_text = "entre la 10: -- : 10"  #Example 7
print(repr(input_text)) #print the output string

And this function fix_time_patterns_in_time_intervals() should be something like this, although you may have to use exceptions for possible index errors. The function should only do the replacements if the hours (the first group) are less than 23, since there is no such thing as a 25th hour in a day. And in the case of minutes (the second group) the function should only make the replacements if the minutes are less than 59, since an hour cannot have more than 60 minutes and the 60th minute is already considered 0 and part of the next hour. Due to this limitation, the replacements should only be made under the conditions that the following conditionals pose within the function, otherwise it would only replace the same substring that was extracted from the original string.

def fix_time_patterns_in_time_intervals(match_num_time):
    hour_exist = False
    if(int(match_num_time[1]) <= 23):
        #do the replacement process
        if(len(match_num_time[1]) == 1): match_num_time[1] = "0"+ str(match_num_time[1])
        elif(len(match_num_time[1]) == 0): match_num_time[1] = "00"
        hour_exist = True
    if(int(match_num_time[2]) <= 59):
        #do the replacement process
        if(len(match_num_time[2]) == 1): match_num_time[2] = "0"+ str(match_num_time[2])
        elif(len(match_num_time[2]) == 0): match_num_time[2] = "00"
    elif( (int(match_num_time[2]) == None) and (hour_exist == True) ):
        #do the replacement process
        match_num_time[2] = "00"

    return match_num_time #the extracted substring

I think I could use regex capturing group match with re.group() or re.groups() method, and extract the first time mentioned the hours in the input string and then extract the other hour that appears in this string.

At the end you should be able to print the original string and object these results(output) in each of the examples respectively :

"entre las 15:00 hs -- 16:10 "  #Example 1
"entre las 21:00 -- 22:00"  #Example 2
"entre la 01:30 -- 02:00 "  #Example 3
"entre la 01:09 h.s. -- 06:00 hs."  #Example 4
"entre la 01:50 -- 06:00"  #Example 5
"entre la 07:59 -- 23:00"  #Example 6
"entre la 10:00 -- 00:10"  #Example 7

some additional examples of what time (hours:minutes) conversions should look like:

"6 :"      --->     "06:00"
"6:"       --->     "06:00"
"6"        --->     "06:00"
": 6"      --->     "00:06"
":6"       --->     "00:06"
": 16"     --->     "00:16"
":16"      --->     "00:16"
" 6"       --->     "06:00"
"15 : 1"   --->     "15:01"
"15 1"     --->     "15:01"
": 15"     --->     "00:15"
"0 15"     --->     "00:15"

I am having problems when extracting values to evaluate within the function fix_time_patterns_in_time_intervals() after identifying them with the regex, I hope you can help me with this.

Matt095
  • 857
  • 3
  • 9
  • Don't forget about conditional regex replacement as described in https://www.regular-expressions.info/replaceconditional.html That would allow you to make the replacement pad the number with zeros so that it is always two digits. – Jerry Jeremiah Aug 28 '22 at 22:36
  • @JerryJeremiah Try something like this, the problem is that you have to pass the substrings to the `fix_time_patterns_in_time_intervals()` function so that it can compare them, and depending on the condition it meets, it will decide what to replace with. For example in a generic substring where the pattern `"XX:YY"` (or one of the other variations that the pattern accept) was matched, `hh = match[1] = "XX"` and `mm = match[2] = "YY"` – Matt095 Aug 28 '22 at 22:40
  • Do you have a regex that you are using to match these strings? – Nick Aug 29 '22 at 00:27
  • Maybe something like this `(\d{0,2})[\s|]*(?::|)[\s|]*(\d{1,2})` could work, my doubt is how to extract the values to send them to the function that evaluates them – Matt095 Aug 29 '22 at 00:39
  • @MatiasNicolasRodriguez This is what I was thinking of: https://regex101.com/r/cP2PRa/1 You could just use a regex to reformat the whole thing. You still need to validate the dates but the reformatting might help with that. – Jerry Jeremiah Aug 29 '22 at 23:19

1 Answers1

1

You can use this regex to match your time values:

(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )

This matches:

  • (?=[:\d]) : assert the string starts with a digit or a : - this ensures that we always start by matching the hour group if it is present
  • (?P<hour>\d+)? : optional digits captured in the hour group
  • *:? * : an optional : surrounded by optional spaces
  • (?P<minute>\d+)? : optional digits captured in the minutes group
  • (?<! ) : assert the string doesn't end in a space so we don't chew up spaces used for formatting

Regex demo on regex101

You can then use this replacement function to check for the existence of the match groups and (if the values are valid) reformat them with leading 0's as required:

def fix_time_patterns_in_time_intervals(match_num_time):
    hour = int(match_num_time.group('hour') or '0')
    minute = int(match_num_time.group('minute') or '0')
    if hour > 23 or minute > 59:
        # invalid, don't convert
        return match_num_time.group(0)
    return f'{hour:02d}:{minute:02d}'

For your sample data (with a couple of invalid values):

times = [
    "entre las 15 : hs -- 16:10 ",
    "entre las 21 :  -- 22",
    "entre la 1 30 -- 2 ",
    "entre la 1 09 h.s. -- 6 : hs.",
    "entre la 25 0 -- 12:0",
    "entre las 13 64 -- 5",
    "entre la 1:50 -- 6 :",
    "entre la 7 59 -- 23 : ",
    "entre la 10: -- : 10"
]

regex = re.compile(r'(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )')

for time in times:
    print(regex.sub(fix_time_patterns_in_time_intervals, time))

Output:

entre las 15:00 hs -- 16:10
entre las 21:00 -- 22:00
entre la 01:30 -- 02:00
entre la 01:09 h.s. -- 06:00 hs.
entre la 25 0 -- 12:00
entre las 13 64 -- 05:00
entre la 01:50 -- 06:00
entre la 07:59 -- 23:00
entre la 10:00 -- 00:10
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thank you very much for the help, by any chance can you think of a way so that the cases `"entre las 21:00-- 22:00"` or `"entre la 10:00-- 00:10"` do not have the number stuck to the `"--"`? – Matt095 Aug 29 '22 at 02:58
  • 1
    @MatiasNicolasRodriguez sorry, hadn't noticed that... you could change the `return` to `return f'{hour:02d}:{minute:02d}' + ('' if match_num_time.group('minute') else ' ')` and that would add a space if the minutes were missing – Nick Aug 29 '22 at 03:03
  • 1
    @MatiasNicolasRodriguez no worries. I've actually made another edit with a change to the regex which I think is a better solution to the issue than the one in my previous comment. – Nick Aug 29 '22 at 03:16