I'm trying to write an xtext parser to parse a simple markup language. The markup uses double characters for styling text. !! is used for bold. I'm struggling to work out how to create the grammar, in particular how to handle the double character symbols. As an example:
The following text !!is bold! !! but not this.
I want to parse this into the following AST:
- Lines
- Line
- Text "The following text "
- BoldText "is bold! "
- Text " but not this."
- Line
Does anyone have any good approaches?
Should I use:
terminal BOLD: '!!'
or
Bold : '!' '!'
I'm thinking that I have to use the second rule. That to handle this I have to have single character terminals and then use parser rules for everything.
My grammar at the moment is:
grammar org.xtext.example.mydsl.MyDsl
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Lines:
lines+=Line*
;
Line:
{Line} content+=(PlainText|BoldText)*
NL
;
PlainText:
text = Text
;
Text returns ecore::EString:
(CHAR|WS)+
;
BoldText:
BOLD
{BoldText} text += PlainText*
BOLD
;
terminal BOLD: '!!';
terminal WS: (' ' | '\t')+;
terminal NL: '\r'? '\n';
terminal CHAR: !(' '|'\t'|'\r'|'\n');
BUT this is getting warnings because it can match repetitions of PlainText OR (CHAR|WS)+ in Text and I don't know how to get rid of that?