parsing arguments without separating whitespace

Question

In smali the signature of a method taking two integers and returning one integer is written like this:

add(II)I

For parsing this using xtext, I tried the following:

ID'('Type*')'Type

Unfortunately this only works with whitespace between the two I.

How can I change the rule to make it not insist on whitespace here?

As far as I see, this should already be a problem with the lexer processing the terminal rules. Whenever it sees a sequence of characters like III it always marks it as an ID immediately. - Independent of the position. :(

To parse something like:

III(III)I

i.e. a function named III taking three Integers and returning another Integer, it seems like I have to force the lexer to always emit only single characters and reassemble it again using a parser rule.

But in this case I don't manage to create an ID rule anymore...

It seems like I missed something important.

NB: Beside primitive data types like I (integer), D (double) and V (void) there are also class types written as Ljava/lang/String; and arrays starting with [.

A typical main method looks like this .method public static main([Ljava/lang/String;)V.

I also asked at the corresponding eclipse forum. Unfortunately no one was able to help. http://www.eclipse.org/forums/index.php/m/1092303/ — michas, Aug 22 '13 at 15:45
If you know a better tag for this kind of question feel free to add it. — michas, Aug 22 '13 at 15:49

Sebastian Zarnekow · Answer 1 · 2013-08-27T07:39:18.133

You may want to try to configure the mwe2 workflow that generates your language to use the extended AntlrGeneratorFragment where you can set the option to use backtracking in the lexer. That should do the trick. You have to do the same for the content assist parser fragment where you'll have to the ContentAssistParserGeneratorFragment.

Some background: The lexer will usually consume the longest matching sequence, e.g. III looks like an ID so it will be consumed as a single ID rather than three individual tokens I. If backtracking is enabled, it'll split this up instead consuming the complete ID. This may impose some difficulties if III is not always a list of types but sometimes a real ID, but you could circumvent those by using a data type rule for valid identifiers.

Thanks, this looks very promising! Could you maybe give a short code example? I still don't understand how backtracking will allow the lexer to figure out the right semantics in the `III` example. — michas, Aug 25 '13 at 23:10

score 1 · Answer 2 · edited Aug 27 '13 at 10:56

1

You could try this with backtracking, but I usually avoid this technique. It can lead to very confusing error messages and can lead to a very slow parser being generated.

Try the following approach:

Parse the parameter string ("III") as an ID
Add an validator, restricting it to only "I"s, together with a good error message (see AbstractInjectableValidator, xText will have genrated a validator for your language, probably called "SmaliJavaValidator")
Extend the EObject representing the type string, so it will break the string up in the individual type descriptions (e.g. single "I"s)

With this approach you parse the type string not until xText has finished with its grammar. You get still a usable result, with a fast grammar and good error message.

A general advise: I usually tend to make my grammar quite permissive and limit the result later with validators. So the grammar stays fast and the user gets good, custom taylored error messages.

edited Aug 27 '13 at 10:56

michas

25,361
15
76
121

answered Aug 27 '13 at 07:55

stefan.schwetschke

8,862
1
26
30

This seems to be a good solution, but beside the primitive datatypes like `I` and `D` there are also classes written as `Ljava/lang/String;`. If the signature contains a class `Type*` is no longer parsable as an ID. - For this approach I probably have to define a meta type covering both IDs and Types, and sort them out later, too. – michas Aug 27 '13 at 11:39
For this you could insert your own terminal (see org.eclipse.xtext.common.Terminals.xtext for examples) and use an alternate clause for your type expression. But sorting out later seems to be the better solution from my point of view. The real fun begins when you add scoping and code completion... – stefan.schwetschke Aug 27 '13 at 12:00

parsing arguments without separating whitespace

2 Answers2