5

I am trying to create a grammar to match content like below:

(For a simple grammar to repro this issue please see ADD 1)

[Defines]
  INF_VERSION                    = 0x00010005
  BASE_NAME                      = WebServer
  FILE_GUID                      = 99E87DCF-6162-40c5-9FA1-32111F5197F7
  MODULE_TYPE                    = SEC
  UEFI_SPECIFICATION_VERSION     = 0x00010005

The UEFI_SPECIFICATION_VERSION = 0x00010005 part is optional.

(for brevity, I omitted some of the grammar).

My grammar 1 looks like this:

defines : '[Defines]'
         define_statement+
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
                  ;

ANTLR 4.7 reports this error:

message: 'rule defines contains a closure with at least one alternative that can match an empty string'

But if I changed grammar like this:

defines : '[Defines]'
         define_statement+
         | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType

The error is gone.

My question is, what does the closure mean? Which part is the closure? The define_statement?

After I move the potentially empty alternative, the defines rule can alternate between '[Defines]' define_statement+ and ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?, which means defines can still match empty string. How could the error be gone?

ADD 1

To make things more clear, I repro this error with a simplified grammar:

grammar test;

rule : alternate+; // <<<<< HERE
alternate : '1'?;

If I use + or * at HERE, ANTLR will report an error:

'rule rule contains a closure with at least one alternative that can match an empty string'

If I use ? at HERE, ANTLR will report a warning:

'rule rule contains an optional block with at least one alternative that can match an empty string'

I am still not sure why.

ADD 2

Each of the alternate WILL be a child node of rule, so if alternate can be empty string, then it is logically possible to lead to endless child nodes for rule. So I guess this may explain why ANTLR forbids me to do that with alternate+ or alternate*. But if it is with alternate?, at most there will be one child node. It's only a performance issue. So ANTLR just generate a warning.

smwikipedia
  • 61,609
  • 92
  • 309
  • 482
  • Can't answer the question itself, but `define_statement` *can* match an empty string in your example, which looks pointless to me. I'd guess "closure" is `define_statement+`. –  Jul 31 '17 at 09:03
  • To think, in your grammar `define_statement+` can match an arbitrary number of empty strings, so I guess that's why ANLTR4 complains. –  Jul 31 '17 at 09:05
  • Yes, I agree with you . But I did that because the `UEFI_SPECIFICATION_VERSION = 0x00010005` part is optional. Maybe I need to re-design the grammar. – smwikipedia Jul 31 '17 at 09:07
  • About `UEFI...`: then your grammar seems completely wrong. What `defines` would match as you wrote is e.g. `[Defines]` plus 10 `INF_VERSION` lines. Sounds like you want something different. –  Jul 31 '17 at 09:07

2 Answers2

4

Let's start with the warning. The application is merely alerting you that something can be matched by the empty string. This is a warning because most of the time, you don't want tokens to match to the empty string.

defines : '[Defines]'
         define_statement+
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
                  ;

Since ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal) is optional (it is followed by ?, it could be replace by nothing, like this:

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | 
                  ;

That last | by itself means the rule can match nothing, or the empty string. So the mystery about the warning is solved. They call it a closure, but you could think of it as a "token binding" or "match." I don't think the terminology is all that important in a practical sense.

The error goes away if you remove the alternative, because then, again rewriting for clarity, we have:

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  ;

And there's nothing optional there. One of those has to match.

You've already mentioned in your comments that you understand why moving the rule to its own rule -- that can potentially match an infinite number of empty strings -- is a bad, idea, so I won't belabor that here.

But why did the error go away when you did that? Because

defines : '[Defines]'
         define_statement+
         | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
         ;

is guaranteed to match something, even if it's only the token [Defines] , which is an implicit lexer token. So even if the UEFI thing is empty string, there's still something to parse. That wasn't true in the first version we examined; indeed the whole define_statement rule there could have been an empty string. That's quite a difference from a parsing standpoint.

Now the big question: Is the [Defines] section truly optional, or not? Only you can answer that. But if it is, perhaps you should just recode it as:

defines : ('[Defines]' define_statement+)?

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal

This makes it completely optional. Again, only you can decide if this is valid for your grammar and expected input.

Make sense? I hope I helped you!

EDIT 1

To relieve the error, try this grammar (I made explicit tokens for the test values to get it to run):

grammar Uefi;
defines : '[Defines]' statement+ ;
statement : define_statement | uefi_statement ;      
uefi_statement : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal ;
define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  ;
// DUMMY VALUES               
SpecVersion_VersionVal : '0x00010005';
BaseName : 'WebServer';
RegistryFormatGUID : '99E87DCF-6162-40c5-9FA1-32111F5197F7';
Edk2ModuleType : 'SEC';
EQ : '=';
WS : [ \t\r\n]+ -> skip;

enter image description here

Community
  • 1
  • 1
TomServo
  • 7,248
  • 5
  • 30
  • 47
  • Thanks. But I don't understand why the `defines` rule in the second grammar *is guaranteed to match something*? `defines` rule still has 2 alternatives, one is non-empty and the other `UEFI...` string can **still** be empty. Please see my updated question. – smwikipedia Jul 31 '17 at 12:41
  • `[Defines]`, as it is listed in single quotes, is what is referred to as an implied lexer token. If present, it will be lexed. As part of a parser rule, the rule will be parsed. That may be the only thing there, depending on the scenarios I laid out, but if present it'll be parsed, hence no error message. I've been through this before with grammars where almost everything is optional. It's just a warning -- only we as language designers can say whether it's okay or not. – TomServo Jul 31 '17 at 12:43
  • In my scenario, the `[Defines]` is mandatory, but **only** the `UEFI...` string line is optional. I am still trying to figure out a grammar for this. – smwikipedia Jul 31 '17 at 14:00
  • @smwikipedia If that's the case, then your original grammar was spot on. Ignore the warning and just realize when you make your Listener or Visitor there may be nothing there to parse when you do your overrides, so check for content in the context before trying to parse something that might not be there. Yours is a very common pattern. Well done. – TomServo Jul 31 '17 at 14:04
  • But that's an error not a warning. I am using ANTLR 4.7. Please see my ADD 1 and 2. – smwikipedia Jul 31 '17 at 14:08
  • @smwikipedia You're right I didn't read that closely enough. The "?" makes it optional and that's all. When I read your original question about "warning" I had deja vu, been there many times. Let me take a bit of a closer look, do you still need help or are you good to go now? So the DEFINES section is mandatory (the header) and only optional one is the UEFI thing, and if present, there can be only one? – TomServo Jul 31 '17 at 14:15
  • Yes, the [Defines] is mandatory. Only the UEFI thing is optional. And if it is present, there can be only one. – smwikipedia Jul 31 '17 at 14:25
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/150631/discussion-between-jlh-and-smwikipedia). – TomServo Jul 31 '17 at 14:29
  • btw, have you tried when there's only the header `[DEFINES]`. Just curious. I have turned off my computer so I can only try it out tomorrow. – smwikipedia Jul 31 '17 at 15:06
  • As I just tried it, in case of only the header `[DEFINES]` exists, simply change the `statement+` to `statement*` will do the trick. – smwikipedia Jul 31 '17 at 23:21
1

Just add my solution. Credit goes to @JLH.

The important thing to get is to have 2 separate rules for lines with different natures.

  • linesGroup1_Defines
  • linesGroup2_Defines.

This way, the optional nature of a line can be reached through | (choice) instead of ?(optional).

grammar inf;

start : configSections;

configSections: configSection+
                EOF;

configSection: section_Defines
             | bSection
             ;

section_Defines : '[Defines]'
                 sectionLine_Defines*;

sectionLine_Defines  : linesGroup1_Defines | linesGroup2_Defines;

linesGroup1_Defines : 'INF_VERSION' EQ SpecVersion_VersionVal 
           | 'BASE_NAME' EQ BaseName
           | 'FILE_GUID' EQ RegistryFormatGUID
           | 'MODULE_TYPE' EQ Edk2ModuleType
           ;
linesGroup2_Defines : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
              | 'PI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
              ;


bSection : '[b]'
           SectionLine_b+;

(Some necessary token definitions are omitted for brevity)

enter image description here

ADD 1

On a second thought, with the above solution, I didn't cover the semantic that linesGroup1_Deines is mandatory and linsGroup2_Defines is optional. Actually both are optional now. It can accept input with only optional lines like below:

[Defines]
  UEFI_SPECIFICATION_VERSION     = 0x00010005

I am not sure if this semantic can/should be covered in the grammar. Maybe I need to further refine it.

smwikipedia
  • 61,609
  • 92
  • 309
  • 482