Creating AST with OMetaJS that includes token value and position

Question

I'm trying to parse a DSL with OMetaJS and produce an AST that includes a token value as well as it's index in the original stream.

I know I can use the Index Capture Rule syntax ( @<rule> ) to give me an object containing the indices framing the token but is it possible to capture that as well as the token value?

E.g for the grammar:

export ometa Test {
  start = @<identifier>,
  identifier = (letter | digit)+
}

Parsing "Bob" gives:

{ fromIdx : 0, toIdx : 3 }

If I remove the '@' from 'identifier' then parsing gives "Bob" as the result. What I'd ideally like to get is combination of the two:

{ fromIdx : 0, toIdx : 3, value: 'Bob' }

I could of course hack the source, but is there a better way to do this?

I want to have both value and position because I'm trying to create a visual representation of the DSL which allows editing of identifier names for example. In this case I need to know where in the original source the identifier appeared so I can modify it.

score 1 · Answer 1 · answered Sep 03 '13 at 21:09

I think what you're asking for is pretty useful, and probably deserves to have its own syntactic sugar. I'll definitely think about it. In the meantime, you could do something like this:

ometa Test {
  parse :r = @<apply(r):value>:node !(node.value = value) -> node,

  identifier = (letter | digit)+,
  start = parse("identifier")
}

Hope that helps!

score 0 · Accepted Answer · answered Sep 03 '13 at 12:14

0

Given that you want the thing, and the span, what about using the peek operator &? That will return the token, but not consume the input. So perhaps something like

spannedThing = (&identifier:token @identifier:span) -> combineThemSomehow(token, span)

might do what you want? (Warning: my OMeta's rusty; the above might not use correct grammar.) You could turn that into a parameterised rule.

answered Sep 03 '13 at 12:14

Frank Shearar

17,012
8
67
94

Never thought of using the lookahead operator. Nice idea. Of course it means you parse everything twice, right? Anyway marking as answer as I didn't find a better way. But for the record I ended up just using the `@<...>` operator and indexing into original source text. – emertechie Sep 03 '13 at 21:13
Well, the parsing should be memoised, so it shouldn't be as bad as a 2x performance hit. I don't know about OMeta/JS, but that's how it'd work in OMeta/Squeak. – Frank Shearar Sep 04 '13 at 06:14
@Frank: I've been looking at OMeta recently, and ran across some old stuff with your name on it and a lot of broken links. Does Pythia still exist anywhere? (Sorry for the off-topic comment, but it seemed like the simplest way to contact you.) – Mason Wheeler Oct 14 '14 at 21:58
@MasonWheeler Oh gosh, I may have it somewhere. If you drop me a mail - join my first & last names with a . and send it to the gmail.com domain - I'll take a look when I can (soonest will be in a week or so). – Frank Shearar Oct 17 '14 at 22:46

Creating AST with OMetaJS that includes token value and position

2 Answers2