3

I am trying to write a regular expression that will allow me to parse and modify strings that may include an instruction to time a specific action, using capture groups to identify "hour" "minute" and "second" values in the input string.

In ruby I have a regex that gets close to the matches & capture groups I need

(?<hour_digit>\d+\s)[a-z]*\s?hour[s\b|b\s]|(?<minute_digit>\d+\s)[a-z]*\s?minute[s\b|b\s][a-z]*|(?<second_digit>\d+\s)[a-z]*\s?second[s\b|b]

I want to find an expression that can capture strings where multiple values could be matched, instead of independently; "5 hours and 15 minutes" should be one match & "30 minutes up to 1 hour" should be one match. Visually the matching of the current regex is like so: enter image description here

Austin Meyers
  • 153
  • 2
  • 12
  • Does https://regex101.com/r/NeGzGB/1 help? – Wiktor Stribiżew Jul 29 '21 at 18:44
  • @WiktorStribiżew thank you, that is helpful! This is a very powerful tool, thanks for sharing. Almost there, but that regex seems to match on punctuation marks. If the string had multiple sentences; "For the first step, wash the clothes for 15 minutes. While the clothes are washing, hold your breath for up to 5 minutes." could I ignore matching the commas & periods somehow? – Austin Meyers Jul 29 '21 at 19:18
  • Right, we need word boundaries. Shall I post an answer with https://regex101.com/r/NeGzGB/2? – Wiktor Stribiżew Jul 29 '21 at 19:36
  • Yes, that should do it – Austin Meyers Jul 29 '21 at 19:43

1 Answers1

1

You can use

(?<!\w)\b(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?\b(?!\w)

See the regex demo. Details:

  • (?<!\w)\b - a left-hand side word boundary ([[:<:]] or \< or \m in some flavors does this)
  • (?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)? - an optional occurrence of
    • (?<hour_digit>\d+) - Group "hour_digit": one or more digits
    • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
    • \s*hours? - zero or more whitespaces, hour or hours
  • (?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)? - an optional occurrence of
    • (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
    • \s* - zero or more whitespaces
    • (?<minute_digit>\d+) - Group "minute_digit": one or more digits
    • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
    • \s*minutes? - zero or more whitespaces, minute or minutes
  • (?:(?:\s*(?:or|up to|and|to))*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?- an optional occurrence of
    • (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
    • \s* - zero or more whitespaces
    • (?<second_digit>\d+) - Group "second_digit": one or more digits
    • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
    • \s*seconds? - zero or more whitespaces, second or seconds
  • \b(?!\w) - a right-hand side word boundary (in some other regex flavors, it is \M, \> or [[:>:]]).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563