A schema description is a language of its own, with its own syntax and idiosyncrasies you have to learn. And you have to maintain its "programs" against which your YAML is verified if your requirements change.
If you are already working with YAML and are familiar with Python you can use YAML's tag facility to check objects at parse time.
Assuming you have a file input.yaml
:
successive:
worker:
cds_process_number: !nonneg 0
spider_interval: !pos 10
run_worker_sh: !path /home/lmakeev/CDS/releases/master/scripts/run_worker.sh
allow:
- !regex "*"
deny:
- !regex "^[A-Z]{3}_.+$"
(your example file with the comments removed and tags inserted), you can create and register four classes that check the values using the following program¹:
import sys
import os
import re
import ruamel.yaml
import pathlib
class NonNeg:
yaml_tag = u"!nonneg"
@classmethod
def from_yaml(cls, constructor, node):
val = int(node.value) # this creates/returns an int
assert val >= 0
return val
class Pos(int):
yaml_tag = u"!pos"
@classmethod
def from_yaml(cls, constructor, node):
val = cls(node.value) # this creates/return a Pos()
assert val > 0
return val
class Path:
yaml_tag = u"!path"
@classmethod
def from_yaml(cls, constructor, node):
val = pathlib.Path(node.value)
assert os.path.exists(val)
return val
class Regex:
yaml_tag = u"!regex"
def __init__(self, val, comp):
# store original string and compile() of that string
self._val = val
self._compiled = comp
@classmethod
def from_yaml(cls, constructor, node):
val = str(node.value)
try:
comp = re.compile(val)
except Exception as e:
comp = None
print("Incorrect regex", node.start_mark)
print(" ", node.tag, node.value)
return cls(val, comp)
yaml = ruamel.yaml.YAML(typ="safe")
yaml.register_class(NonNeg)
yaml.register_class(Pos)
yaml.register_class(Path)
yaml.register_class(Regex)
data = yaml.load(pathlib.Path('input.yaml'))
The actual checks in the individual from_yaml
classmethods should be adapted to your needs (I had to remove the assert for the Path, as I don't have that file).
If you run the above you'll note that it prints:
Incorrect regex in "input.yaml", line 7, column 9
!regex *
because "*"
is not a valid regular expression. Did you mean: ".*"
?
¹ This was done using ruamel.yaml, a YAML 1.2 parser, of which I am the author. You can achieve the same results with PyYAML, e.g by subclassing ObjectDict
(which is unsafe by default, so make sure you correct that in your code)