I am writing lexer rules for a custom description language using pyLR1 which shall include time literals like for example:
10h30m # meaning 10 hours + 30 minutes
5m30s # meaning 5 minutes + 30 seconds
10h20m15s # meaning 10 hours + 20 minutes + 15 seconds
15.6s # meaning 15.6 seconds
The order of specification for hour, minute and second parts shall be fixed to h
, m
, s
. To specify this in detail, I want the following valid combinations hms
, hm
, h
, ms
, m
and s
(with numbers between the different segments of course).
As a bonus the regex should check for decimal (i.e. non-natural) numbers in the segments and only allow these in the segment with least significance.
So I have for all but the last group a number match like:
([0-9]+)
And for the last group even:
([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?) # to allow for .5 and 0.5 and 5.0 and 5
Going through all the combinations of h, m and s a cute little python script gives me the following regex:
(([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)h|([0-9]+)h([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)h([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)m|([0-9]+)m([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s|([0-9]*\.[0-9]+|[0-9]+(\.[0-9]*)?)s)
Obviously, this is a little bit of horror expression. Is there any way to simplify this? The answer must work with pythons re
module and I will also accept answers which do not work with pyLR1
if its due to its restricted subset of regular expressions.