5

I attempt to document a syntax I use in menuentries.conf, which is a configuration file for menu entries, by describing the syntax / grammar using the notation known as extended Backus-Naur-Form EBNF(BNF) This menuentries.conf uses indenting levels as syntactical component as should be displayed in this example:

menu_entry_1
menu_entry_2
    menu_entry_2_submenu_entry_1
    menu_entry_2_submenu_entry_2
        menu_entry_2_submenu_entry_2_subsubmenu_1
        menu_entry_2_submenu_entry_2_subsubmenu_2
    menu_entry_2_submenu_entry_3
menu_entry_3
    menu_entry_3_submenu_entry_1

In the above example each entry is represented by a string, which for the sake of the example implies/indicates its position. In addition the example should follow these rules

  • each menu item is represented by single line (hence the menu entries are delimited by NEWLINE)
  • menu entries without any indenting are "top level" menu entries
  • menu entries with an indenting are not "top level" but child entries to the respective higher/upper level menu entry.

My attempt at providing a BNF is the following:

NEWLINE := '\n'
INDENTING := '    '
menu_entry_string := ('a'|'b'|....|'z'|'_'|'0'|'1'|...|'9')+
menu_entries := menu_entry (NEWLINE menuentry)*
menu_entry := menu_entry_string (NEWLINE INDENTING menu_entry)*
submenu_entry := INDENTING menu_entry_string
subsubmenu_entry := INDENTING INDENTING menu_entry_string

My question hence is with regards to my disatisfaction of the recursively declared notion menu_entry and its redundancy with submenu_entry and subsubmenu_entry.
Knowing that python uses indenting as well to create the notion of blocks, I thought to look up the BNF/definition of pythons grammar (as found here: https://docs.python.org/3/reference/grammar.html) but it leaves the relevant notions of INDENT and DEDENT out of is grammar.

My question is hence: How to correctly use EBNF to describe a grammar/syntax in which indenting is employed as a grouping block? Ideally a small example (or if possible correction of my attempt) would be appreciated.

In the best case scenario the EBNF would define the notion of nesting-level of the block which would be: 1 for submenu_entry and 2 for subsubmenu_entry ....

humanityANDpeace
  • 4,350
  • 3
  • 37
  • 63

1 Answers1

4

You might be thinking with a mind of a programmer when you need the mind of a language creator. There are traditionally two parts to creating a language:

  1. Lexeme specification: defines groups of characters that represent a single syntactic construct (i.e. a token or terminal value)
  2. Grammar specification: defines the valid combinations of syntactic constructs/tokens/terminal values that make up non-terminal values that express how the language can be used

Some languages are able to combine the lexical and syntactic parts of language creation, but doing so in your case is not a good idea because grammars by themselves cannot express the idea of specific indentation alone. That's something you'd leave for a lexer to handle.

Below is the BNF grammar, where STRING, NEWLINE, INDENT, and DEDENT are all terminal values generated by your lexer:

start ::= list
        | list NEWLINE
        .

list  ::= entry
        | list entry
        .

entry ::= STRING NEWLINE
        | STRING NEWLINE INDENT list DEDENT
        .

Simple enough, right? I included the start rule to ensure that any file ending with a NEWLINE or DEDENT token is valid. Without it, a file ending in a NEWLINE token that wasn't preceded by a STRING token would be invalid.

I used BNF, but you can just as easily use EBNF if you wish. The point is that a lexer can understand how many spaces of indentation are used to generate an INDENT or DEDENT token (or an error if necessary), and your grammar should simply specify how to work with the tokens generated.

  • firstly, many thanks for your answer, secondly is possible could you give an insight if it was possible at all to have INDENT and DEDENT not being result/outcome/part of the lexer? – humanityANDpeace Aug 28 '18 at 07:50
  • @humanityANDpeace With a parser that handles these things, sure, you don't need the lexer to handle it, but the grammar itself cannot describe what you want as `INDENT` and `DEDENT` because the amount of space they represent will vary. –  Aug 28 '18 at 08:34