I am trying to create a grammar to match content like below:
(For a simple grammar to repro this issue please see ADD 1)
[Defines]
INF_VERSION = 0x00010005
BASE_NAME = WebServer
FILE_GUID = 99E87DCF-6162-40c5-9FA1-32111F5197F7
MODULE_TYPE = SEC
UEFI_SPECIFICATION_VERSION = 0x00010005
The UEFI_SPECIFICATION_VERSION = 0x00010005
part is optional.
(for brevity, I omitted some of the grammar).
My grammar 1 looks like this:
defines : '[Defines]'
define_statement+
;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
;
ANTLR 4.7 reports this error:
message: 'rule defines contains a closure with at least one alternative that can match an empty string'
But if I changed grammar like this:
defines : '[Defines]'
define_statement+
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
The error is gone.
My question is, what does the closure
mean? Which part is the closure
? The define_statement
?
After I move the potentially empty alternative, the defines
rule can alternate between '[Defines]' define_statement+
and ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
, which means defines
can still match empty string. How could the error be gone?
ADD 1
To make things more clear, I repro this error with a simplified grammar:
grammar test;
rule : alternate+; // <<<<< HERE
alternate : '1'?;
If I use +
or *
at HERE
, ANTLR will report an error:
'rule rule contains a closure with at least one alternative that can match an empty string'
If I use ?
at HERE
, ANTLR will report a warning:
'rule rule contains an optional block with at least one alternative that can match an empty string'
I am still not sure why.
ADD 2
Each of the alternate
WILL be a child node of rule
, so if alternate
can be empty string, then it is logically possible to lead to endless child nodes for rule
. So I guess this may explain why ANTLR forbids me to do that with alternate+
or alternate*
. But if it is with alternate?
, at most there will be one child node. It's only a performance issue. So ANTLR just generate a warning.