0

While trying to get my feet wet with lexical analysers and parser generators, I realized that most resources in the Internet (Tutorials, Forums, StackOverflow) only talk about languages. Is it because tools like Flex and Bison are only suitable for languages or is it because anything that can be parsed is considered to be a language?

To be more specific, I have a file of the following form:

File    : Bananarama.xyz
Date    : 22.12.2017

TableStart
BlockStart
Param1       : 12
Param2       : 1.5
Param3[lbs]  : 1539
Param4[cm]   : 55
BlockEnd

BlockStart
[...]
BlockEnd
TableEnd

Is this file suitable to be parsed by LALR-Parser?

exilit
  • 1,156
  • 11
  • 23

1 Answers1

2

(Written) Languages are nothing more than structured sequences of symbols which contain information. That is no different to what you have. Files of data, files of configuration settings are all sequences of symbols that contain information. The ordering and sequencing of the symbols needs to be recognised in order to discover (or match) the information contained therein.

However, there are different ways of structuring the symbols to represent information. Some ways of organising the symbols are easier to recognise than others. By easier I mean with less code, less time, simpler algorithms. Some are more difficult.

What you are asking really translates to, "does this example arrangement of symbols require an algorithm of this complexity to be recognised?"

The answer is straightforward Computer Science. I'd just use the Chomsky Hierarchy to evaluate the type of algorithm needed to parse (match) the symbol sequences in the file.

Without further detailed explanation, it is sufficient to say that the language is either type 2 or type 3, and can certainly be parsed by an LALR-parser. The only remaining question left to be resolved, is whether a LALR-parser is too complex for this language.

Can a regular grammar (and hence regular expressions) we utilised for this task? Your example of the file structure is actually insufficient to answer this question. You need to know if the structures can be nested or not. Can a BLOCK contain a BLOCK or not?

If there is no nesting, the regular expressions are sufficiently powerful, and there are plenty of tools that do the job (like egrep, perl, awk, sed, findstr).

user207421
  • 305,947
  • 44
  • 307
  • 483
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • Well, a block cannot contain another block, but I simplified the example a bit. All Blocks are surrounded by a TableStart/TableEnd pair (I edited the question). But this doesn't change anything, does it? – exilit Jan 13 '17 at 11:04
  • Another question: Don't the Blocks introduce some kind of context sensitivity, so that it becomes a language of type 1? – exilit Jan 13 '17 at 11:06
  • @exilit It is only context sensitive if the symbols in one block change the syntax of those blocks that follow, that is the grammar changes as the symbols are matched. From your example it appears not to be so. Perhaps you are confusing syntax with the semantics of the data. – Brian Tompsett - 汤莱恩 Jan 13 '17 at 11:12
  • Yes, I think you are right, I am most probably confusing, as I am quite unexperienced with parsing. – exilit Jan 13 '17 at 11:19
  • Thank you for the link. I'll have a look at it. – exilit Jan 13 '17 at 14:54