0

Given a string and a formal grammar I would like to determine wether a string is a valid prefix of another string that is accepted by that grammar.

For example, say I was looking for valid JSON prefixes, I would like to determine that [1, 2, 3 is a valid prefix of a JSON document however [1, 2, ] is not.

I've done some research to find a parser generator that supports this but I can't seem to find any good resources.

I've tried looking at parser generators like tree sitter which support incremental parsing however it doesn't seem to distinguish between errors which show up for valid prefixes and ones for invalid ones.

Betlander
  • 35
  • 3
  • re your last sentence: If you have an incremental parser, and you feed it a valid prefix, why would there be errors? – Michael Dyck Mar 15 '23 at 01:06
  • This was my misunderstanding, it seems that incremental parser refers to its ability to update the parse tree without having to re-parse the entire document. What I would like to do is to "drive" the parser forward 1 byte at a time over the input and be able to check for errors. – Betlander Mar 16 '23 at 02:22

1 Answers1

1

Assuming "grammar" means "context free grammar", then an LR parser will detect the first (leftmost) point (if any) at which no possible continuation would form a valid sentence. So I think you just need an LR parser that calls back to your code to get the next token:

  • If the parser asks your callback for the next token, and you've reached the end of the string in question, then it's a valid prefix.
  • If the parser raised an error before then, it's not a valid prefix.
Michael Dyck
  • 2,153
  • 1
  • 14
  • 18