1

How would you implement a grammar that can import a file and still parsing it using LARK ?

f.e.:

@import file.txt
.....
sten
  • 7,028
  • 9
  • 41
  • 63
  • Are you saying you want to create a lark parser that opens a file while it's running and parses that file. Would a solution be to parse text for file names then run the lark parser on them? I think you could parse text for file name then see if it's available to open and insert it's text into the string. Then start parsing it. – Michael Hearn Nov 09 '19 at 23:10
  • that was my idea too .. i was wondering if there was better way – sten Nov 10 '19 at 01:14
  • 1
    I would not do it that way....Do you know why...? If you make a lark that does that it will most likely be recursive. So if you open file.txt and it has file.txt in the file it will break. – Michael Hearn Nov 10 '19 at 01:25
  • you are right .. but may be i can check for that – sten Nov 10 '19 at 01:28
  • Unless the imported file affect the *syntax* of your code, the best way is to handle the import after the parsing is done, from which you can call Lark again. – Erez Feb 26 '22 at 09:14

3 Answers3

1

I found a GitHub that seems relevant is this what you are looking for? https://github.com/lark-parser/lark

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark('''start: WORD "," WORD "!"
            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''')

print( l.parse("Hello, World!") )
print( l.parse(data) )

If you want to open the file and use it as the lark

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark(data)

print( l.parse("Hello, World!") )
print( l.parse("your string to parse") )
Michael Hearn
  • 557
  • 6
  • 18
1

the [code at this link][1] will do includes / import in lark. I didn't write this, just passing it on.

it still needs some tweaking for error handling, but it's a good place to start.

below is my slight modifications to it, it actually reads from the files.

import sys

from lark import Lark

from lark.lexer import Lexer, LexerState, LexerThread

class RecursiveLexerThread(LexerThread):

    def __init__(self, lexer: Lexer, lexer_state):
        self.lexer = lexer
        self.state_stack = [lexer_state]

    def lex(self, parser_state):
        while self.state_stack:
            lexer_state = self.state_stack[-1]
            lex = self.lexer.lex(lexer_state, parser_state)
            try:
                token = next(lex)
            except StopIteration:
                self.state_stack.pop()  # We are done with this file
            else:
                if token.type == "_INCLUDE":
                    name = token.value[8:].strip()  # get just the filename
                    self.state_stack.append(LexerState(open(name).read()))
            yield token  # The parser still expects this token either way

grammar = r"""
start: ((_INCLUDE|line)* _EOL)*

line: STRING+
STRING : /\S+/

_INCLUDE.1 : /include\s+\S+/i

_EOL : /(\n+)/

%ignore /[ \t]+/
"""

parser = Lark(grammar, _plugins={
    "LexerThread": RecursiveLexerThread
}, parser="lalr")

tree = parser.parse(open(sys.argv[1]).read())

print(tree.pretty())

https://gist.github.com/MegaIng/c6abba4d9be87473d8d586734f2b39c9

kdubs
  • 1,596
  • 1
  • 21
  • 36
0

I just figured out I can use C/C++ preprocessor to generate a file which then I can parse :)

It is not integrated but can make it work

cpp -P included.inc > output.file
sten
  • 7,028
  • 9
  • 41
  • 63