Matching the hash character in Tatsu

Question

I am getting an exception attempting to parse the # character using Tatsu:

import tatsu

grammar = r'''
@@comments :: //
@@eol_comments :: //

start = '#' ;
'''

print(tatsu.__version__)

parser = tatsu.compile(grammar)
ast = parser.parse('#', trace=True)

5.8.3
↙start ~1:1
#
≢'#' 
≢start ~1:1
#

...

tatsu.exceptions.FailedToken: (1:1) expecting '#' :
#
^
start

If I change the # to a in both the grammar and the text then it is successful. I think the issue might be that # indicates a grammar comment in Tatsu, but I'm not sure how to escape it.

This is interesting. Let me take a look. – Apalala Aug 26 '23 at 19:02 — Apalala, Aug 26 '23 at 19:02

score 0 · Answer 1 · answered Aug 26 '23 at 19:31

The problem here is that config.eol_comments_re is not being overridden with the @@eol_comments definition.

Could you post an issue for this problem at https://github.com/neogeny/TatSu/issues?

The other problem is that comments should not be checked while parsing a '' token.

score 0 · Answer 2 · answered Aug 26 '23 at 22:09

I researched this problem today for a long while, and found much room for improvement in TatSu.

Yet the conclusion is that // is a valid regex that matches zero input.

The solution in your case is to set the comments to strings you don't expect to find in the input:

    grammar = r'''
        @@comments :: /@@@@@@@/
        @@eol_comments :: /@@@@@@@/

        start = '#' ;
    '''

https://github.com/neogeny/TatSu/issues/303

Matching the hash character in Tatsu

2 Answers2