0

I have defined grammar rules like

TOKEN : { < SINGLE_QUOTE : " ' " > }

TOKEN : {  < STRING_LITERAL : " ' "  (~["\n","\r"])*  " ' ">

But I am not able to parse sequences like 're'd' .I need the parser to parse re'd as a string literal.But the parser parses 're' seperately and 'd' seperately for these rules.

Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
S Shruthi
  • 35
  • 4

2 Answers2

2

The following should work:

TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : {  < STRING_LITERAL : "'"  (~["\n","\r"])*  "'"> }

This is pretty much what you had, except that I removed some spaces.

Now if there are two on more apostrophes on a line (i.e. without an intervening newline or return) then the first and the last of those apostrophes together with all characters between should be lexed as one STRING_LITERAL token. That includes all intervening apostrophes. This is assuming there are no other rules involving apostrophes. For example, if your file is 're'd' that should lex as one token; likewise 'abc' + 'def' should lex as one token.

Theodore Norvell
  • 15,366
  • 6
  • 31
  • 45
  • Thanks for the answer.I got it working.But the above expressions do not allow backslash(\\) as the last character in the string.How can I modify the expression to allow backslash as the last character in the string? – S Shruthi Aug 24 '16 at 04:38
  • It's true that the last character of a STRING_LITERAL can not be a backslash. The last character must be an apostrophe. Same with the first character. However a backslash can occur at any position other than first and last. For example `'\'` is matched by STRING_LITERAL. Do you have reason to think otherwise? – Theodore Norvell Sep 02 '16 at 18:18
2

If you need to lex re'd as STRING_LITERAL token then use the following rule

TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : {  < STRING_LITERAL : "'"?  (~["\n","\r"])*  "'"?>

I didn't see the rule for matching "re" separately.

In javacc, definition of your lexical specification STRING_LITERAL is to start with "'" single quot. But your input doesn't have the "'" at starting.

The "?" added in the STRING_LITERAL makes the single quot optional and if present only one. so this will match your input and lex as STRING_LITERAL.

JavaCC decision making rules:

1.) JavaCC will looks for the longest match. Here in this case even if the input starts with the "'" the possible matches are SINGLE_QUOTE and STRING_LITERAL. the second input character tells which token to choose STRING_LITERAL.

2.) JavaCC takes the the rule declared first in the grammar. Here if the input is only "'" then it will be lexed as SINGLE_QUOTE even if there is the possible two matches SINGLE_QUOTE and STRING_LITERAL.

Hope this will help you...

sarath kumar
  • 360
  • 2
  • 15