Dynamic parser - read tokens from a separate file

Question

Let's say I want to parse my new language that looks like this:

main.mylang

import "tags.mylang"
cat dog bacon

And there's another file tags.mylang that looks like this:

cat "meow"
dog "woof"
bacon "sizzle"

Running main.mylang would output

meow woof sizzle

The problem I'm having is that "cat", "dog", and "bacon" are defined in a separate file, as implemented my the mylang developer; i.e., I can't make them part of the grammar beforehand.

Is it possible to dynamically add these tags into the grammar as it's parsing? I don't want to add a wildcard \w+ or something because I want it to error on unrecognized tags.

Edit: I'm writing this using jison, which is based on bison.

How do you know a tag is unrecognized, if it might be a variable? Is your grammar such that tags and variables never occur in the same context? Or are tags considered reserved words, even though they might change from time to time? — rici, Mar 13 '13 at 15:36
@rici: Tags would trump variables. If there is a tag defined for a token, then that token would be treated like a tag, even if there is also a variable under the same name. Similar to how you can define both functions and variables with the same name in many other languages, I guess. I'm starting to think that this is a bad idea, however. I might just want to preface variables with a `$` like PHP or something... — mpen, Mar 13 '13 at 15:40
I tend to agree that letting tags trump variable names is not a great idea. The problem is that "tag files" are not composable; since you must know what variable names cannot be used in order to write a "program", you cannot just add a new tag file to an existing program. Also, it makes it hard to syntax colour. On the other hand, `$` sigils are ugly. Good luck, anyway. — rici, Mar 13 '13 at 15:59

score 2 · Answer 1 · answered Mar 13 '13 at 15:54

I'll assume that tags all match the pattern for variables, whatever pattern that might be. (\a\w*, maybe). Define a dictionary whose keys are tags; the value can be whatever you want to associate with the tag. As I understand it, you can make this dictionary available to both the parser and the lexer by putting it inside the object parser.yy.

The lexer rule for variables would be something like this (I don't know much about jison, so this is based on bison+flex):

{variable}    if (yytext in yy.tags) { return TAG; } else { return VARIABLE; }

If you wanted to have different token types for different tags, (perhaps because tags are aliases for grammatical concepts, or something like that), you could store the token type in the tag dictionary, so that you could return it from the lexer.

In the grammar for tag definition files, you could add a tag definition simply by adding the key and appropriate value to yy.tags.

That looks a lot easier than I thought it would be. Makes sense, thank you! — mpen, Mar 13 '13 at 16:09

score 1 · Answer 2 · answered Mar 12 '13 at 14:32

1

You can go with the wildcard match \w+ that you suggest, then use the YYERROR macro to raise your own syntax error when your parser's semantic logic detects an unrecognized/undefined tag.

answered Mar 12 '13 at 14:32

David Gorsline

4,933
12
31
36

This could work, but it means that I can't have 2 different kinds of tokens that "look the same". i.e., if both "tags" and "variables" are defined as `\w+` then the parser won't be able to tell them apart because I don't define them until parse time. Furthermore, it wouldn't generate the right tree. – mpen Mar 12 '13 at 15:39

Dynamic parser - read tokens from a separate file

2 Answers2