I have a very interesting problem with parsing the following grammar (of Convnetional Commits) - which is a convention how git commit messages should be formatted.
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
- the body is simply multi-line text where anything goes
- the footer is key value pairs with
fobar: this is value
format and newline separating them.
Now, regarding my dilemma: what would be the best way to differentiate the body part from the footer part? According to the spec, those should be separated by two newline characters so at first I thought this would be good fit for ANTLR4 island grammars. I came up with something like what I posted here, but after some testing, I discovered it is not flexible - it won't work if the body is not there (body section is optional) but the footer is there.
I can think of a couple of ways to restrict the grammar to a certain language and implement this differentiation with semantic predicates but ideally, I would like to avoid that.
Now, I think that the problem boils down how to differentiate properly between KEY
and SINGLE_LINE
tokens which do conflict (in the next iteration of my implementation)
mode Text;
KEY: [a-z][a-z_-]+;
SINGLE_LINE: ~[\n]+;
MULTI_LINE: SINGLE_LINE (NEWLINE SINGLE_LINE)*;
NEXT: NEWLINE NEWLINE;
What would be the best way to differentiate between KEY
and SINGLE_LINE
?