Regex capture groups for "timer" sentence pattern

Question

I am trying to write a regular expression that will allow me to parse and modify strings that may include an instruction to time a specific action, using capture groups to identify "hour" "minute" and "second" values in the input string.

In ruby I have a regex that gets close to the matches & capture groups I need

(?<hour_digit>\d+\s)[a-z]*\s?hour[s\b|b\s]|(?<minute_digit>\d+\s)[a-z]*\s?minute[s\b|b\s][a-z]*|(?<second_digit>\d+\s)[a-z]*\s?second[s\b|b]

I want to find an expression that can capture strings where multiple values could be matched, instead of independently; "5 hours and 15 minutes" should be one match & "30 minutes up to 1 hour" should be one match. Visually the matching of the current regex is like so:

@WiktorStribiżew thank you, that is helpful! This is a very powerful tool, thanks for sharing. Almost there, but that regex seems to match on punctuation marks. If the string had multiple sentences; "For the first step, wash the clothes for 15 minutes. While the clothes are washing, hold your breath for up to 5 minutes." could I ignore matching the commas & periods somehow? — Austin Meyers, Jul 29 '21 at 19:18
Right, we need word boundaries. Shall I post an answer with https://regex101.com/r/NeGzGB/2? — Wiktor Stribiżew, Jul 29 '21 at 19:36

Wiktor Stribiżew · Accepted Answer · 2021-07-29T19:53:12.783

You can use

(?<!\w)\b(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?\b(?!\w)

See the regex demo. Details:

(?<!\w)\b - a left-hand side word boundary ([[:<:]] or \< or \m in some flavors does this)
(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)? - an optional occurrence of
- (?<hour_digit>\d+) - Group "hour_digit": one or more digits
- (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
- \s*hours? - zero or more whitespaces, hour or hours
(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)? - an optional occurrence of
- (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
- \s* - zero or more whitespaces
- (?<minute_digit>\d+) - Group "minute_digit": one or more digits
- (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
- \s*minutes? - zero or more whitespaces, minute or minutes
(?:(?:\s*(?:or|up to|and|to))*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?- an optional occurrence of
- (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
- \s* - zero or more whitespaces
- (?<second_digit>\d+) - Group "second_digit": one or more digits
- (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
- \s*seconds? - zero or more whitespaces, second or seconds
\b(?!\w) - a right-hand side word boundary (in some other regex flavors, it is \M, \> or [[:>:]]).

Regex capture groups for "timer" sentence pattern

1 Answers1