17

Since YAML has a rather complicated syntax, is it possible to write a parser for YAML mainly using ANTLR4 ? I was looking for examples, that implement the YAML like indentation parsing and the detection of data types.

dreftymac
  • 31,404
  • 26
  • 119
  • 182
JE42
  • 4,881
  • 6
  • 41
  • 51
  • Indentation handling can by found in the Python grammar (https://github.com/antlr/grammars-v4/tree/master/python3). – Onur Aug 30 '14 at 09:07
  • Yeah, but i think YAML's indentation handling similar but still quite a bit different than pythons. http://yaml.org/spec/1.2/spec.html#id2777534 vs https://docs.python.org/3/reference/lexical_analysis.html#indentation – JE42 Aug 30 '14 at 09:45
  • Yaml identation seems to be more complex that in python. At first glance it looks like it can be achieved with several lexer modes (to cope with flow style) and lexer actions that convert whitespace to `Indent`|`Dedent` tokens, so you have not to deal with whitespace in the parser. – Onur Aug 31 '14 at 10:33
  • 1
    Worth mentioning the github repo [enyaml](https://github.com/tkellogg/enyaml) an ANTLR + .net yaml grammar. I have not used it but have been debating forking it and porting it to java, and then changing the grammar to embed some of the domain rules about our yaml documents inside the parser. I'll update this question when that's done. – Groostav Mar 04 '16 at 21:10
  • 1
    As far as I can tell, you can handle the block syntax for YAML collections (indentation rules) inside a handwritten lexer. I used this approach myself to create a **very basic** YAML parser based on ANTLR [here](https://github.com/sanssecours/Yan-LR). Apart from the custom lexer (`YAMLLexer.cpp`), all other parts of the parser use the standard facilities provided by ANTLR (input handling, parser grammar, listener interface). – René Schwaiger Jul 19 '18 at 15:24

1 Answers1

2

The YAML specification contains a BNF grammar. Bear in mind that according to this document, fully correct YAML is context-sensitive and not parseable by parser-generators, so your grammar will have to describe a context-free superset.

Ari Fordsham
  • 2,437
  • 7
  • 28