2

i am continue to learning PEG.js, but stuck on the next issue.

PEG.js-generated parser unable to match string containing underscopes:

CONFIG += stl_off

but successfully parse the string without them:

CONFIG += static

(this is a built-in variable CONFIG "appending-assignment" statement, rvalue is a list of limited set of strings)

What am i doing wrong?


The grammar:

Start =
    Statement* {return env; }

Statement
    = Comment
    / GenericAssignmentStatementT

GenericAssignmentStatementT = Whitespace* GenericAssignmentStatement Whitespace*
GenericAssignmentStatement
    // TEMPLATE
    = TemplateAssignmentStatement
    // CONFIG
    / ConfigAssignmentStatement
    / ConfigAppendingAssignmentStatement

// -------------------------------------------------------------------------------------------------

// # Single-line comment
Comment "Comment string" = Whitespace* "#" rvalue:$(!LineBreak .)* LineBreak+ {
    return "#" + rvalue;
}

// -------------------------------------------------------------------------------------------------

// TEMPLATE = app|lib|subdirs|aux|vcapp|vclib
SystemTemplateVariable = "TEMPLATE"
SystemTemplateVariableValue = "app" / "lib" / "subdirs" / "aux" / "vcapp" / "vclib"
TemplateAssignmentStatement = lvalue:SystemTemplateVariable AssignmentOperator rvalue:SystemTemplateVariableValue Whitespace* LineBreak* {
    if (!env.qmakeVars)
        env.qmakeVars = {};
    env.qmakeVars[lvalue] = rvalue;
    return {name:"TEMPLATE", op:"=", value:rvalue};
}

// -------------------------------------------------------------------------------------------------

// CONFIG = release|debug|debug_and_release|debug_and_release_target
SystemConfigVariable = "CONFIG"
SystemConfigVariableValue = "release" / "debug" / "debug_and_release" / "debug_and_release_target"

ConfigAssignmentStatement = lvalue:SystemConfigVariable AssignmentOperator rvalue:SystemConfigVariableValue? Whitespace* LineBreak* {
    if (!env.qmakeVars)
        env.qmakeVars = {};
    env.qmakeVars[lvalue] = [rvalue];
    return {name:"CONFIG", op:"=", value:rvalue};
 }

 ConfigAppendingAssignmentStatement = lvalue:SystemConfigVariable 
 AppendingAssignmentOperator rvalue:SystemConfigVariableValue Whitespace* LineBreak* {
    if (!env.qmakeVars)
        env.qmakeVars = {};
    if (!env.qmakeVars[lvalue])
        env.qmakeVars[lvalue] = [];
    env.qmakeVars[lvalue].push(rvalue);
    return {name:"CONFIG", op:"+=", value:rvalue};
}

// Assignment operators
AssignmentOperator = Whitespace* "=" Whitespace*
AppendingAssignmentOperator = Whitespace* "+=" Whitespace*

// Delimeters
LineBreak = [\r\n] {
    return "LB";
}

Whitespace = [ \t] {
    return "WS";
}

Test input:

TEMPLATE = app
CONFIG += debug_and_release

Test PEG.js output:

Line 2, column 16: Expected "CONFIG", "TEMPLATE", Comment string, [ \t], [\r\n], or end of input but "_" found.
eraxillan
  • 1,552
  • 1
  • 19
  • 40

1 Answers1

1

PEG evaluates SystemConfigVariableValue from left to right, so debug_and_release will actually be matched by the release literal (after which the parser gets confused).

If you declare the more specific literal first, it'll work:

SystemConfigVariableValue = "debug_and_release_target" / "debug_and_release" / "release" / "debug"
robertklep
  • 198,204
  • 35
  • 394
  • 381
  • Thanks, it works! However, this is weird for me - can you point me to any article/tutorial/paper there PEG.js keyword matching described? – eraxillan May 29 '17 at 05:07
  • @eraxillan the documentation states, for `expression_1 / expression_2 / ... / expression_n`: _"Try to match the first expression, if it does not succeed, try the second one, etc. Return the match result of the first successfully matched expression"_. It doesn't check which literal matches _best_, it checks which literal matches _first_. I assume it doesn't backtrack when it subsequently can't parse the rest. – robertklep May 29 '17 at 06:37
  • Thanks again, Robert. So, i need to search for "best practicies" document or just use another parser generation tool - my grammar seems too complicated – eraxillan May 30 '17 at 05:31
  • I'm very confused by this. I understand that PEGs evaluate conditions left to right, but matching a subset of a string literal is not matching a string literal. I shall raise an issue on the github – Arman Jun 12 '18 at 21:37
  • @Arman `"release"` in the grammar matches "release" in `debug_and_release` in the input. AFAIK, there's nothing that states that PEG will provide the _best_ match, or that matches are done in a tokenized fashion (where the input is somehow separated by non-word boundaries before matching). – robertklep Jun 13 '18 at 07:28