5

I'm trying to find a grammar for the following example csv:

a; test;test ;
;a; test;test ;
<ignore>; <ignore> ;test
a; <ignore> test;test
a; this is test ;test

The semicolon is used as the separator. Cells containing only the text <ignore> have a special meaning and should be represented by their own type in the EMF model. However <igonore> test is not such a special value. The whitespace around semicolons must be ignored. Cells may contain any characters except the semicolon.

So far I have come up with this grammar:

grammar com.example.Csv

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate impEx "http://www.example.com/Csv

Model:
    valueLine=ValueLine

ValueLine:
    ';'? WHITE_SPACE values+=Value WHITE_SPACE (';' WHITE_SPACE values+=Value WHITE_SPACE)* ';'*;

Value:
    ( (=>'<ignore>') {IGNORE_VALUE} IGNORE_VALUE) | text=TEXT_VALUE;

terminal TEXT_VALUE:
    (!';')*;

IGNORE_VALUE:
    '<ignore>';

WHITE_SPACE:
    (' '|'\t')*;

But using my testcase

@InjectWith(CsvInjectorProvider.class)
@RunWith(XtextRunner.class)
public class ParserTest {

    @Inject
    private ParseHelper<Model> parser;

    @Test
    public void parseDomainmodel() throws Exception {
        Model parsed = parser.parse("abc;  <ignore>;  <ignore> \t;  <ignore> a;def");
        System.out.println(parsed.getValueLine().getValues());
    }
}

I see that the IGNORE_VALUE rule doesn't match <ignore>. The parser seems to use the TEXT_VALUE rule for the starting whitespace.

What do I need to do in order to parse the <ignore> values correctly?

SpaceTrucker
  • 13,377
  • 6
  • 60
  • 99

2 Answers2

0

I see you are dealing with regular expression in your grammar file try the below:

IGNORE_VALUE:
    '\<ignore\>';

if you are dealing with spaces it should be something like:

IGNORE_VALUE:
    '\ *\<ignore\>';

Hopefully that helps.

Mehdi Karamosly
  • 5,388
  • 2
  • 32
  • 50
0

The problem here is that the Lexer performs a longest match. And since your TEXT terminal matches pretty much anything, it gets chosen.

I would suggest to only have text columns and do the analysis of "is this column ignored?" in the later stages like validation and highlighting.

approxiblue
  • 6,982
  • 16
  • 51
  • 59
Stefan Oehme
  • 449
  • 2
  • 7