5

We have project which stores settings in YAML (settings file is generated by ansible scripts). Now we are using pyyaml to parse YAML format and marshmallow to validate settings. I'm pretty happy with storing setting in YAML, but I don't think marshmellow is the tool I need (schemas are hard to read, I do not need serialization for settings, want something like xsd). So what are the best practices of validating settings in project, maybe there is language independent way? (we are using python 2.7)

YAML settings:

successive:
  worker:
    cds_process_number: 0 # positive integer or zero
    spider_interval: 10 # positive integer
    run_worker_sh: /home/lmakeev/CDS/releases/master/scripts/run_worker.sh # OS path
    allow:
      - "*" # regular expression
    deny:
      - "^[A-Z]{3}_.+$" # regular expression
Helvdan
  • 416
  • 4
  • 14
  • Maybe elaborate a bit more. Why do you think marshmallow isn't the right tool? Right now this is very open-ended. – Grimmy Jul 17 '17 at 15:23
  • due to very complex structure schemas are hard to read, I do not need serialization for settings, want something like xsd – Helvdan Jul 17 '17 at 15:28
  • Something like this?: https://github.com/Julian/jsonschema – Grimmy Jul 17 '17 at 15:34
  • I should think over using JSON then. Isn't there a solution for YAML though? – Helvdan Jul 17 '17 at 15:48
  • He meant do something similar for YAML, not just use that code and switch to JSON. – Matthew Schuchard Jul 17 '17 at 15:56
  • You parse yaml into a dict anyway that could be validated using that library or even dumped as json. – Grimmy Jul 17 '17 at 16:42
  • @Grimmy You **never** parse YAML document into a `dict`, except for the special case where your top-level structure is a non-tagged mapping. – Anthon Jul 17 '17 at 16:44
  • There is seldom need for a schema if you use YAML's build-in tags to indicate what types you want to load. Please share a minimal YAML file generated (with at least a few of your types) and descript the validation you would want (e.g in comments following the type). A schema description is not language dependent either, it has its own schema language (which you found out can be complex to learn/maintain in addition to your programming language of choice) – Anthon Jul 17 '17 at 16:48
  • 1
    @Anthon, I renewed this issue with YAML settings example. – Helvdan Jul 17 '17 at 17:10

1 Answers1

3

A schema description is a language of its own, with its own syntax and idiosyncrasies you have to learn. And you have to maintain its "programs" against which your YAML is verified if your requirements change.

If you are already working with YAML and are familiar with Python you can use YAML's tag facility to check objects at parse time.

Assuming you have a file input.yaml:

successive:
  worker:
    cds_process_number: !nonneg 0
    spider_interval: !pos 10
    run_worker_sh: !path /home/lmakeev/CDS/releases/master/scripts/run_worker.sh
    allow:
      - !regex "*"
    deny:
      - !regex "^[A-Z]{3}_.+$"

(your example file with the comments removed and tags inserted), you can create and register four classes that check the values using the following program¹:

import sys
import os
import re
import ruamel.yaml
import pathlib

class NonNeg:
    yaml_tag = u"!nonneg"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = int(node.value)   # this creates/returns an int
        assert val >= 0
        return val

class Pos(int):
    yaml_tag = u"!pos"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = cls(node.value)  # this creates/return a Pos()
        assert val > 0
        return val

class Path:
    yaml_tag = u"!path"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = pathlib.Path(node.value)
        assert os.path.exists(val)
        return val


class Regex:
    yaml_tag = u"!regex"
    def __init__(self, val, comp):
        # store original string and compile() of that string
        self._val = val
        self._compiled = comp

    @classmethod
    def from_yaml(cls, constructor, node):
        val = str(node.value)
        try:
            comp = re.compile(val)
        except Exception as e:
            comp = None
            print("Incorrect regex", node.start_mark)
            print("  ", node.tag, node.value)
        return cls(val, comp)


yaml = ruamel.yaml.YAML(typ="safe")
yaml.register_class(NonNeg)
yaml.register_class(Pos)
yaml.register_class(Path)
yaml.register_class(Regex)

data = yaml.load(pathlib.Path('input.yaml'))

The actual checks in the individual from_yaml classmethods should be adapted to your needs (I had to remove the assert for the Path, as I don't have that file).

If you run the above you'll note that it prints:

Incorrect regex   in "input.yaml", line 7, column 9
   !regex *

because "*" is not a valid regular expression. Did you mean: ".*"?


¹ This was done using ruamel.yaml, a YAML 1.2 parser, of which I am the author. You can achieve the same results with PyYAML, e.g by subclassing ObjectDict (which is unsafe by default, so make sure you correct that in your code)

Anthon
  • 69,918
  • 32
  • 186
  • 246