0

I am newbie to Antlrworks. I am writing combined grammar file to parse XML file. XML file is pretty big and complex.

There are many lexer rules defined in grammar. Antlrworks 1.4.3 is generating code without any problem. But when i debug the code using it is generating the following error.

    [13:29:42] D:\Antlr\Grammer Files\output\OrigionalSampleCDFXMLLexer.java:6472: code too large
[13:29:42]         public int specialStateTransition(int s, IntStream _input) throws NoViableAltException {
[13:29:42]                    ^
[13:29:42] 1 error.

Below are lexer rules defined in my combined grammar file:

DATEFORMATE : DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;

TIMEFORMATE : 'T' ( DIGIT DIGIT ':'  DIGIT DIGIT ':'  DIGIT DIGIT );

CATEGORY_SW_CS_COLLECTION  :     'FEATURE' | 'COLLECTION'; // These are fixed


CATEGORY_SW_INSTANCE  :  'VALUE' | 'DEPENDENT_VALUE' | 'BOOLEAN' |'ASCII' | 'VAL_BLK' | 'CURVE' | 
                     'MAP' | 'STRUCTURE' | 'UNION' |
                     'VALUE_ARRAY' | 'CURVE_ARRAY' |'MAP_ARRAY' | 'STRUCTURE_ARRAY';

CATEGORY_SW_AXIS_CONT  :     'FIX_AXIS' | 'STD_AXIS' ;

CATEGORY_COMMON_IN_AXIS_INSTANCE
    :   'CURVE_AXIS' |'RES_AXIS' | 'COM_AXIS' ;

CATEGORY_SW_INSTANCE_TREE  : 'VCD' | 'NO_VCD' ; 

CATEGORY_MSRSW  : 'CDF20' ; 

FLAG_VALUES
    :   'TRUE' | 'FALSE';

ATTR_EQ :  {tagMode}? => '=' ;

PCDATA : {!tagMode}? =>  (~'<')* ; 

//NMTOKENS: {tagMode}? => ( '\"' (NMTOKEN ' ')* '\"' | '\''(NMTOKEN ' ')* '\'') ;

NMTOKEN :   {tagMode}? => ( '\"' NMTOKEN_CHAR* '\"' | '\''NMTOKEN_CHAR* '\'');


ID  : {tagMode}? => ( '\"' LETTER (LETTER | DIGIT | '_' )* '\"'
                | '\''  LETTER (LETTER | DIGIT | '_' )* '\''
                )
            ;

CDATA :
        {tagMode}? =>  ( '\"' (~('\"\'&<>'))*  '\"'
        | '\'' (~('\"\'&<>'))* '\''
        )
    ;


TAG_START_OPEN : '<' {tagMode = true;};

TAG_END_OPEN :   '</' {tagMode = true;};

TAG_CLOSE : {tagMode}? => '>' {tagMode = false;};

TAG_EMPTY_CLOSE : {tagMode}? => '/>' {tagMode = false;};

fragment NMTOKEN_CHAR: (LETTER | DIGIT | '_' | '-' | '.' | ':');

fragment LETTER : 'A'..'Z' | 'a'..'z' | 'ü'; 

//fragment Exponent : ('e'|'E') ('+'|'-')? (DIGIT)+ ;

fragment DIGIT : '0'..'9';

WS  :  {tagMode}? => (' ' | '\t'| '\r' | '\n')+ {$channel=99;} ;

And off course i have parser rules in the same file;-).

Correcting lexer rules by replacing most of '+' by '*' didn't not work.

Is something wrong with lexer rules????

Another Question:

Tried moving some of the lexer rules from combined grammar file to another lexer grammar file. In this case importing lexer grammar to combined grammar is giving problem. It says 'Lexer file name' is undefined with the fix idea 'create the grammar file'.

grammar SampleCDFXML;

options {
language = Java;
output=AST;
tokenVocab=XMLBaseLexer; 

}

import XMLBaseLexer ; // Here it says undefined import "XMLBaseLexer"

'XMLBaseLexer' is lexer grammar which has some of the lexer rules from original combined grammar.

I searched for import problems in many websites but didn't get answer.

Please someone give ideas to solve the problems.

Any help is very much appreciated.

Thank you!

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
Pan
  • 25
  • 2
  • 9
  • Hi Bart, I dont know to whom i should ask this question. Do you have any idea why your answer and our comments are disappeared here? – Pan Feb 19 '13 at 05:56

1 Answers1

1

There is some update. If i run the same combined grammar in Antlrworks2 it works. There are some syntax modifications needed to make it runnable on Antlrworks2. Ex: {$channel=99;} to be replaced by ->channel(99).

Thanks!!

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
Pan
  • 25
  • 2
  • 9