I am trying to come up with a PEG grammar that would parse a hostname according the following BNF of RFC 2396
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
There is no problem with domainlabel
and toplabel
.
The rule for hostname
however, it seems, cannot be expressed in PEG.
Here is why I think so:
If we take the grammar as written in BNF the whole input is consumed by *(domainlabel ".")
which doesn't know when to stop since toplabel [ "." ]
is indistinguishable from it.
simplified self-contained illustration:
h = (d '.')* t '.'?
d = [dt]
t = [t]
This would parse t
, d.d.t
and fail on d.d.d
which is totally expected, but it fails to parse t.
and d.d.t.
which both are valid cases.
If we add a lookahead then it would consume t.
and d.d.t.
, but fail on d.t.t.
.
h = (!(t '.'?)d '.')* t '.'?
d = [dt]
t = [t]
So I am out of ideas, is there a way to express this BNF in PEG?