-2

There is a yaml file which is having wrong indentation like below:

name:  testing
date:       2020-07-13
version:    1.0
targets:
  - sequence: 1
   name:     Book1
    author: abc
  - sequence:   2
    name:   Book2    
   author: xyz

which If I try to load it using pyYAML will get parser exception like :

yaml.parser.ParserError: while parsing a block collection
  in "E:/test.yaml", line 5, column 3
expected <block end>, but found '<block mapping start>'
  in "E:/test.yaml", line 6, column 4

How to convert this yaml with indentation problem to dict without manually fixing indentation? or convert to dict no matter how yaml is indented?

Girish
  • 366
  • 3
  • 15
  • 2
    Just correct the file content. Otherwise, you'd end up using regular expressions, split functions or a self-written parser altogether. – Jan Jul 23 '20 at 07:44
  • 2
    Wherever you get that data from, make them fix the problem. Otherwise dozens of consumers of that data will have the same problem. There's a clean code principle called "Root cause analysis" and it should not only be a root cause analysis but also a root cause fix,. – Thomas Weller Jul 23 '20 at 07:49
  • 1
    I had a similar case with XML ~14 years ago. Everyone worked around the XML problems of a company. And due to those workarounds, other companies had to provide invalid XML as well. They had to make their valid XML invalid! It was horrible. Don't let such a thing slip in – Thomas Weller Jul 23 '20 at 07:51
  • @Jan yes its better to go with correcting file content before parsing it. I just wanted to know if there is any possibility hat we can bypass this step which I am not aware of. – Girish Jul 23 '20 at 09:45

1 Answers1

1

Loading any kind of structured data always requires some kind of grammar specification, be it explicit via a specification document, or implicit by just writing the code loading it.

YAML has an explicit specification. The file you show does not match the YAML grammar and thus is not YAML. It is not a bit not YAML, but not at all.

If you want to load a file regardless of indentation, this has nothing to do with YAML anymore. You need to define a grammar, possibly derived from YAML, that does understand your file, and then you need to implement it.

You can write some sed or awk command to fix this particular file, but it can't easily be generalized because you need a proper YAML parser just to detect wrong indentation.

So the realistic solution is to require whoever is supplying your input to give you a valid YAML file. Anything else is far too much effort.

flyx
  • 35,506
  • 7
  • 89
  • 126
  • YAML File I have provided is for example to show bad indentation and exception if I try to parse any such file and intention is to know any possible way to parse YAML with such case. – Girish Jul 23 '20 at 09:48