0

I am working on some syntax tree synchronising tools and try to write a parser for a small subset of Java; I am confused about the Java 10 grammar specification and consider that the definition of FieldAccess is wrong.

In my opinion, the grammar of FieldAccess is something like obj.x, in which obj is an identifier (or something alike).

But it seems that the grammar for FieldAccess cannot produce obj.x

FieldAccess:
Primary . Identifier 
super . Identifier 
TypeName . super . Identifier

because the definition of Primary is

Primary:
PrimaryNoNewArray 
ArrayCreationExpression

in which neither of the nonterminals can be Identifier.

I believe the grammar for FieldAccess should be
PostfixExpression . Identifier,
where PostfixExpression is the nonterminal ‘one layer higher than Primary’:

PostfixExpression:
Primary 
ExpressionName 
PostIncrementExpression 
PostDecrementExpression

So that ExpressionName can eventually produce an identifier as desired

ExpressionName:
Identifier
AmbiguousName . Identifier

Can anyone give me some comments, or kindly tell me a proper place to report this issue?
I can only find a place for reporting bugs in implementations of the Java Platform, but hardly a place for reporting errors in the language specification.

James Z
  • 12,209
  • 10
  • 24
  • 44
Zirun
  • 21
  • 4
  • Interesting question, but the part that asks for the correct bug tracker to use is unfortunately off topic here. – GhostCat Jul 13 '18 at 07:11
  • 2
    It's not a bug. `FieldAccess` intentionally does not match `id.id` - `ExpressionName` does. The note on [Primary](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-Primary) explains why that is. – sepp2k Jul 13 '18 at 07:17
  • @sepp2k Yes, but the problem is that `Primary` cannot be `ExpressionName`. For me, the explanation is about `ArrayAccess`, whose grammar is correct: `ExpressionName [ Expression ]` – Zirun Jul 13 '18 at 07:31
  • You're right that `Primary` cannot be `ExpressionName`, but again that's by design. `id.id` (or, for that matter, just `id`) is not a `Primary`, but rather an `ExpressionName` (which can be reached from `Postfix`, but not from `Primary`). The reason for that is an ambiguity involving array accesses, which is what the note is talking about. The note only mentions plain identifiers, but the same logic applies to `id.id` (note that `id.id` could also be a type name, just like `id` could be a type name). – sepp2k Jul 13 '18 at 07:37
  • @sepp2k I see your points but your argument does not solve the problem that the grammar of `FieldAccess`, i.e. `Primary . Identifier` cannot yield something like `obj.x`. – Zirun Jul 13 '18 at 08:45
  • `FieldAccess` can't derive `obj.x` - that's true. And you're right that my arguments do not change that - it's a fact and can't be changed. But why is that a problem? I mean I get why it's counter-intuitive (as the note in the spec admits as well), but as long as `Expression` can derive `obj.x` (via `PostfixExpression` and then `ExpressionName`), why is it a problem that `FieldAccess` can't? – sepp2k Jul 13 '18 at 08:52
  • @sepp2k I see. Thanks! But if the grammar is correct, it is really unnatural that `this.x` is a FieldAccess while `obj.x` is an `Expression`. In addition, from the explanation of the grammar “... but not for FieldAccess because this uses an identifier directly”, I tend to think that `FieldAccess` indeed uses an identifier directly; but its production rule uses `Primary` which cannot cannot be an identifier. – Zirun Jul 13 '18 at 09:53
  • If `Primary` can't be ultimately an identifier there are much bigger problems than this. – user207421 Jul 13 '18 at 11:15
  • The `FieldAccess` grammar notes constructs which ultimately lead to a field access production. In contrast, expressions of the form `identifier` or `identifier . anotherIdentifier` *may* be a field access, but could also be a type name. Depending on the context, a parser can not even tell whether such an `ExpressionName` is a field access or a type, as that requires resolving the names against a current class path (including traversal of type hierarchies and so on). – Holger Aug 22 '18 at 14:44
  • @Holger Thank you. Now I understand that `FieldAccess` only lists out unambiguous cases. So I assume that after parsing the ambiguous parts will finally be converted to constructs such as, a field access or a type. Is it true? I also looked at the ‘already answered’ question you pointed to, but I still do not understand why this is a problem in parsing. I assume that, if the ‘context’ requires remaining text (tokens), we can lookahead or use synthesised attributes; if the ‘context’ requires parsed parts, we can use inherited attributes. Will you kindly give any comment on this? – Zirun Aug 24 '18 at 00:20
  • See for example `foo.bar();`. `foo` can be a field, but it could also be a local variable or a type (when `bar()` is a `static` method). Even better for `foo.bar.baz();`, where `foo.bar` could be a qualified type name, but if not, `foo` can be a field, a local variable or a type, whereas `bar` can be a field or a nested class. To find out, what it actually is, you need do resolve the names, considering the nested lexical scopes, the imports and the inheritance, which requires looking into other classes in the current class path. That’s outside the scope of a *parser*. – Holger Aug 24 '18 at 06:42

0 Answers0