1

I have the following JISON file (lite version of my actual file, but reproduces my problem):

%lex

%%

"do"                        return 'DO';
[a-zA-Z_][a-zA-Z0-9_]*      return 'ID';
"::"                        return 'DOUBLECOLON'
<<EOF>>                     return 'ENDOFFILE';

/lex

%%

start
    : ID DOUBLECOLON ID ENDOFFILE
    {$$ = {type: "enumval", enum: $1, val: $3}}
    ;

It is for parsing something like "AnimalTypes::cat". It works fine for things like "AnimalTypes::cat", but the when it sees dog instead of cat, it asumes it's a DO instead of an id. I can see why it does that, but how do I get around it? I've been looking at other JISON documents, but can't seem to spot the difference that (I assume) makes those work.

This is the error I get:

JisonParserError: Parse error on line 1:
PetTypes::dog
----------^
Expecting "ID", "enumstr", "id", got unexpected "DO"

Repro steps:

  1. Install jison-gho globally from npm (or modify code to use local version). I use Node v14.6.0.
  2. Save the JISON above as minimal-repro.jison
  3. Run: jison -m es -o ./minimal.mjs ./minimal-repro.jison to create parser
  4. Create a file named test.mjs with code like:
import Parser from "./minimal.mjs";
Parser.parser.parse("PetTypes::dog")
  1. Run node test.mjs

Edit: Updated with a reproducible example. Edit2: Simpler JISON

palantus
  • 155
  • 2
  • 12
  • Please create a [mre] and edit it into your question. Also, specify which jison you are using; sadly, it makes a difference. We need a complete reproducible example because otherwise there's a lot of speculation necessary. (For example, I speculate that you supplied a particular option, but maybe you just typed `do` differently in your actual jison file.) The lexer options are important in this case – rici Jan 05 '21 at 21:44
  • @rici I've updated it with a complete jison file to reproduce my problem and info regarding my environment (see repro steps). – palantus Jan 06 '21 at 10:03
  • If you remove the "DO" rule, does it match properly? How about not using a definition and writing the `id` regex explicitly? Dog is longer than "Do", so it should match. unrelated, note that if you actually had an `enum::do`, it would break on your pattern matching (you would need to swap the rule order). – kabanus Jan 06 '21 at 10:46
  • @kabanus If I remove ```"do" return 'DO';```, then it works fine. I do however need it to match do-while loops. I'm guessing that the same issue would arise with while-loops, variable names etc. I just tried replacing ```{id}``` with the regex, but it didn't change anything. If I swap the order of ```id``` and ```do```, then everything would be an id :( What I really don't understand is, that this is pretty basic stuff. In eg. java, I should be able to create a variable named ```doesThisWork```, so there must be something I'm missing... – palantus Jan 06 '21 at 11:02
  • The Flex pattern matching rules (which Jison) is based on explicitly say the pattern with the longest match is used, and only in case of a tie is the first one used. This seems like a bug as is. Try and get rid of everything except these two patterns, maybe even user code. That would be super minimal, and if that breaks, you found a bug I think. – kabanus Jan 06 '21 at 11:37
  • @kabanus I've just made it as small as I can (see updated JISON) and it still fails with the same error :( – palantus Jan 06 '21 at 11:48
  • I would [open a bug](https://github.com/zaach/jison/issues) with this minimal example. Flex rules say Dog should match before do. – kabanus Jan 06 '21 at 12:00
  • @kabanus Thanks. I've opened the following bug at the jison-gho repo, as it doesn't seem to be an issue with the jison package that it is based on. https://github.com/GerHobbelt/jison/issues/62 – palantus Jan 06 '21 at 12:12
  • 1
    @kabanus: "it's not a bug, it's a feature" :-) See https://github.com/zaach/jison/issues/63 – rici Jan 06 '21 at 12:26

1 Answers1

1

Unlike (f)lex, the jison lexer accepts the first matching pattern, even if it is not the longest matching pattern. You can get the (f)lex behaviour by using

 %option flex

However, that significantly slows down the scanner.

The original jison automatically added \b to the end of patterns which ended with a literal string matching an alphabetic character, to make it easier to match keywords without incurring this overhead. In jison-gho, this feature was turned off unless you specify

 %option easy_keyword_rules

See https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens.

So either of those options will achieve the behaviour you expect.

rici
  • 234,347
  • 28
  • 237
  • 341