I'm trying to find a grammar for the following example csv:
a; test;test ;
;a; test;test ;
<ignore>; <ignore> ;test
a; <ignore> test;test
a; this is test ;test
The semicolon is used as the separator. Cells containing only the text <ignore>
have a special meaning and should be represented by their own type in the EMF model. However <igonore> test
is not such a special value. The whitespace around semicolons must be ignored. Cells may contain any characters except the semicolon.
So far I have come up with this grammar:
grammar com.example.Csv
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate impEx "http://www.example.com/Csv
Model:
valueLine=ValueLine
ValueLine:
';'? WHITE_SPACE values+=Value WHITE_SPACE (';' WHITE_SPACE values+=Value WHITE_SPACE)* ';'*;
Value:
( (=>'<ignore>') {IGNORE_VALUE} IGNORE_VALUE) | text=TEXT_VALUE;
terminal TEXT_VALUE:
(!';')*;
IGNORE_VALUE:
'<ignore>';
WHITE_SPACE:
(' '|'\t')*;
But using my testcase
@InjectWith(CsvInjectorProvider.class)
@RunWith(XtextRunner.class)
public class ParserTest {
@Inject
private ParseHelper<Model> parser;
@Test
public void parseDomainmodel() throws Exception {
Model parsed = parser.parse("abc; <ignore>; <ignore> \t; <ignore> a;def");
System.out.println(parsed.getValueLine().getValues());
}
}
I see that the IGNORE_VALUE
rule doesn't match <ignore>
. The parser seems to use the TEXT_VALUE
rule for the starting whitespace.
What do I need to do in order to parse the <ignore>
values correctly?