1

From the Regexp::Grammars documentation:

The difference between a token and a rule is that a token treats any whitespace within it exactly as a normal Perl regular expression would. That is, a sequence of whitespace in a token is ignored if the /x modifier is in effect, or else matches the same literal sequence of whitespace characters (if /x is not in effect).

In a rule, most sequences of whitespace are treated as matching the implicit subrule <.ws>, which is automatically predefined to match optional whitespace (i.e. \s*).

...

In other words, a rule such as:

<rule: sentence>   <noun> <verb>
               |   <verb> <noun>

is equivalent to a token with added non-capturing whitespace matching:

<token: sentence>  <.ws> <noun> <.ws> <verb>
                |  <.ws> <verb> <.ws> <noun>

Is there a way to get the rule to ignore the leading implicit <.ws>? In the example above, it would be equivalent to:

<token: sentence>  <noun> <.ws> <verb>
                |  <verb> <.ws> <noun>
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • *"Is there a way to get the rule to ignore the leading implicit <.ws>?"* Why would you do that? – Håkon Hægland Apr 22 '21 at 19:57
  • Because it's more DWIM? – Zaid Apr 22 '21 at 20:04
  • Can't you just insert a specific token (e.g. ``) before the `` rule. The `` token's purpose is to gobble up space, hence the `` rule will never match leading white space. Or am I missing something? – Håkon Hægland Apr 22 '21 at 20:41
  • I do have a custom `<.ws>`. Perhaps a more concrete example may help explain my use case: ` <.ws> `. I would have liked to write it as just ` ` were it not for the leading `<.ws>`. It feels like I'm missing a trick with the rule definition. – Zaid Apr 22 '21 at 20:43
  • Re "*` <.ws> `*", That's not a token, that's a grammar rule. A token is a like a `<`, a `<<`, a string literal. If it was exactly one space between the value and the unit, ok fine, it's more grey. But if you allow the same whitespace as elsewhere in your grammar, no way it's a token. – ikegami Apr 22 '21 at 21:05
  • Secondly, whitespace is not a concern of a token, by definition. If a function calls is `print(...)`, and you don't want to allow ws before the `(`, the grammar rule would define func calls as `IDENT '(' arg_list ')` (in pseudo-code, where uppercase or in single-quotes is token, lowecase is grammar rule, and `<>` is a directive). Does R::G support this? Don't know. – ikegami Apr 22 '21 at 21:06
  • __ I would love for it to be a rule and not a token – Zaid Apr 22 '21 at 22:16
  • Does the engine actually care? – ikegami Apr 22 '21 at 23:00
  • Not really, I already have a workaround using tokens and the ungainly `<.ws>`. I was just hoping for some out-of-the-box functionality to cater for this. – Zaid Apr 22 '21 at 23:47
  • Interesting that Raku [implements a different philosophy](https://docs.raku.org/language/grammars#ws) (trailing `<.ws>` instead of leading `<.ws>`) – Zaid Apr 24 '21 at 19:16

0 Answers0