27

I need to use environment variable "PATH" in yaml file which needs to be parsed with a script.

This is the environment variable I have set on my terminal:

$ echo $PATH
/Users/abc/Downloads/tbwork

This is my sample.yml:

---
Top: ${PATH}/my.txt
Vars:
- a
- b

When I parse this yaml file with my script, I don't see PATH variables actual value.

This is my script:

import yaml
import os
import sys

stream = open("sample.yml", "r")
docs = yaml.load_all(stream)
for doc in docs:
    for k,v in doc.items():
        print k, "->", v
    print "\n",

Output:

Top -> ${PATH}/my.txt
Vars -> ['a', 'b']

Expected output is:

Top -> /Users/abc/Downloads/tbwork/my.txt
Vars -> ['a', 'b']

Can someone help me figuring out the correct way to do it if I am doing it wrong way?

npatel
  • 1,081
  • 2
  • 13
  • 21
  • 2
    I think you are mixing up YAML with bash script. No? If you want to do so, you will have to evaluate the string in a terminal environment or to use [re](https://docs.python.org/2/library/re.html) module to figure out whether there are env vars and replace them using ``os.environ``. – Léopold Houdin Sep 19 '18 at 18:45
  • YAML doesn't support string interpolation like this. maybe take a look at Jinja2 templating engine, or similar? – meatspace Sep 19 '18 at 18:46
  • No, I am not mixing anything, only thing what I want to do is to use environment variable in yaml. Not sure if that is doable. – npatel Sep 19 '18 at 18:46
  • I found a workaround. After I parse yaml in my script, I am replacing PATH with actual value. Thanks for the suggestions! – npatel Sep 19 '18 at 18:54
  • 1
    You should not be using `load_all()`. It is documented to be potentially unsafe, and it is not necessary. Read the documentation and then use `safe_load_all()` – Anthon Sep 19 '18 at 20:58

6 Answers6

30

PY-yaml library doesn't resolve environment variables by default. You need to define an implicit resolver that will find the regex that defines an environment variable and execute a function to resolve it.

You can do it through yaml.add_implicit_resolver and yaml.add_constructor. In the code below, you are defining a resolver that will match on ${ env variable } in the YAML value and calling the function path_constructor to look up the environment variable.

import yaml
import re
import os

path_matcher = re.compile(r'\$\{([^}^{]+)\}')
def path_constructor(loader, node):
  ''' Extract the matched value, expand env variable, and replace the match '''
  value = node.value
  match = path_matcher.match(value)
  env_var = match.group()[2:-1]
  return os.environ.get(env_var) + value[match.end():]

yaml.add_implicit_resolver('!path', path_matcher)
yaml.add_constructor('!path', path_constructor)

data = """
env: ${VAR}/file.txt
other: file.txt
"""

if __name__ == '__main__':
  p = yaml.load(data, Loader=yaml.FullLoader)
  print(os.environ.get('VAR')) ## /home/abc
  print(p['env']) ## /home/abc/file.txt

Warning: Do not run this if you are not the one specifying the env variables (or any other untrusted input) as there are remote code execution vulnerabilities with FullLoader as of July 2020.

dmchk
  • 608
  • 6
  • 8
  • 2
    Never ever propose to use `load()` (read the documentation if you don't know why) when it is not necessary (almost always). Alwasy use `safe_load()` in examples. – Anthon Sep 19 '18 at 21:00
  • 2
    To use `safe_load()` I had to modify the above like so... `yaml.add_implicit_resolver('!path', path_matcher, None, SafeLoader)` `yaml.add_constructor('!path', path_constructor, SafeLoader)` – kberg Nov 29 '18 at 23:51
  • @kberg that indeed seems to work for `string` input, however when `yaml.load()` with a file as input it does not work. Any suggestions to make this work for file input too? – Tom Hemmes May 13 '19 at 16:20
  • @TomHemmes Try using `yaml.safe_load()` instead of just `load()` – kberg Jul 12 '19 at 21:51
  • 2
    I'm in the same problem than @dmchk and using: `yaml.add_implicit_resolver('!path', path_matcher)` `yaml.add_constructor('!path', path_constructor)` or: `yaml.add_implicit_resolver('!path', path_matcher, None, SafeLoader)` `yaml.add_constructor('!path', path_constructor, SafeLoader)` And I can not resolve this problem – Guillermo Diaz Apr 27 '20 at 15:41
  • 1
    @GuillermoDiaz I modified the answer to specify the loader FullLoader. It's safe to use. Read here for more: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation – dmchk Apr 27 '20 at 21:56
  • 1
    @dmchk FYI: "NOTE The FullLoader loader class and full_load function (the current default for load) should be avoided for now. New exploits in 5.3.1 were found in July 2020. These exploits will be addressed in the next release, but if further exploits are found then FullLoader may go away." – totalhack Sep 30 '20 at 14:46
  • Thank you, @totalhack! Added a warning to the answer. – dmchk Sep 30 '20 at 22:26
12

Here is an alternative version which does use a new Loader class if you do not want to modify the global/default yaml Loader.

And more importantly, it correctly replaces interpolated strings that are not just the environment variables, eg path/to/${SOME_VAR}/and/${NEXT_VAR}/foo/bar

        path_matcher = re.compile(r'.*\$\{([^}^{]+)\}.*')
        def path_constructor(loader, node):
            return os.path.expandvars(node.value)

        class EnvVarLoader(yaml.SafeLoader):
            pass

        EnvVarLoader.add_implicit_resolver('!path', path_matcher, None)
        EnvVarLoader.add_constructor('!path', path_constructor)

        with open(configPath) as f:
            c = yaml.load(f, Loader=EnvVarLoader)
kolis
  • 415
  • 5
  • 10
  • 1
    This doesn't seem to work when the value is quoted. It seems like PyYAML never calls the resolver if the value is quoted based on a quick test. – totalhack Sep 30 '20 at 16:17
6

There is a nice library envyaml for this. With it it's very simple:

from envyaml import EnvYAML

# read file env.yaml and parse config
env = EnvYAML('env.yaml')
D. Maley
  • 106
  • 1
  • 3
  • 4
    There is probably more to it, but it appears the main approach of this library is this neat little trick: `yaml = safe_load(os.path.expandvars(f.read()))` – totalhack Sep 30 '20 at 15:03
  • `envyaml` allows for conveniences like (for yaml dictionaries) `my_dict[database.username]`, while `pyyaml` requires (regular Python) `my_dict['database']['username']`, or (for yaml lists) `envyaml` allows `my_dict[database.users.7]` (to access individual list elements using their index, like a dictionary key) compared to (`pyyaml`, which again needs regular python) `my_dict['database']['users'][7]`. Per [their `requirements.txt`](https://raw.githubusercontent.com/thesimj/envyaml/master/requirements.txt), `yaml` is a dependency of `envyaml`. – edesz Feb 25 '21 at 02:32
1

You can see a how to here, which lead to the very small library pyaml-env for ease of use so that we don't repeat things in every project.

So, using the library, your sample yaml becomes:

---
Top: !ENV ${PATH}/my.txt
Vars:
- a
- b

and with parse_config

from pyaml_env import parse_config
config = parse_config('path/to/config.yaml')

print(config)
# outputs the following, with the environment variables resolved
{
    'Top': '/Users/abc/Downloads/tbwork/my.txt'
    'Vars': ['a', 'b']
}

There are also options to use default values if you wish, like this:

---
Top: !ENV ${PATH:'~/data/'}/my.txt
Vars:
- a
- b

About the implementation, in short: For PyYAML to be able to resolve environment variables, we need three main things:

  1. A regex pattern for the environment variable identification e.g. pattern = re.compile(‘.?${(\w+)}.?’)

  2. A tag that will signify that there’s an environment variable (or more) to be parsed, e.g. !ENV.

  3. And a function that the loader will use to resolve the environment variables

A full example:

import os
import re
import yaml


def parse_config(path=None, data=None, tag='!ENV'):
    """
    Load a yaml configuration file and resolve any environment variables
    The environment variables must have !ENV before them and be in this format
    to be parsed: ${VAR_NAME}.
    E.g.:
    database:
        host: !ENV ${HOST}
        port: !ENV ${PORT}
    app:
        log_path: !ENV '/var/${LOG_PATH}'
        something_else: !ENV '${AWESOME_ENV_VAR}/var/${A_SECOND_AWESOME_VAR}'
    :param str path: the path to the yaml file
    :param str data: the yaml data itself as a stream
    :param str tag: the tag to look for
    :return: the dict configuration
    :rtype: dict[str, T]
    """
    # pattern for global vars: look for ${word}
    pattern = re.compile('.*?\${(\w+)}.*?')
    loader = yaml.SafeLoader

    # the tag will be used to mark where to start searching for the pattern
    # e.g. somekey: !ENV somestring${MYENVVAR}blah blah blah
    loader.add_implicit_resolver(tag, pattern, None)

    def constructor_env_variables(loader, node):
        """
        Extracts the environment variable from the node's value
        :param yaml.Loader loader: the yaml loader
        :param node: the current node in the yaml
        :return: the parsed string that contains the value of the environment
        variable
        """
        value = loader.construct_scalar(node)
        match = pattern.findall(value)  # to find all env variables in line
        if match:
            full_value = value
            for g in match:
                full_value = full_value.replace(
                    f'${{{g}}}', os.environ.get(g, g)
                )
            return full_value
        return value

    loader.add_constructor(tag, constructor_env_variables)

    if path:
        with open(path) as conf_data:
            return yaml.load(conf_data, Loader=loader)
    elif data:
        return yaml.load(data, Loader=loader)
    else:
        raise ValueError('Either a path or data should be defined as input')
mkaran
  • 2,528
  • 20
  • 23
  • Is there any reason why this wouldn't work when setting a new environment variable with ```os.environ['NEW_VAR'] = 'xxxx'```? In your full example listed, it parses the ```${}``` correctly in say ```dir/${NEW_VAR}/dir2/```, but it returns ```dir/NEW_VAR/dir2/``` instead of ```dir/xxxx/dir2/```. In the actual import of ```from pyaml_env import parse_config```, it returns N/A for the values. As a check and balance, ```print(os.environ.get("NEW_VAR"))``` works as expected. – DataMinion Jul 04 '21 at 19:46
  • Nevermind the last reply of mine. Rookie mistake. Let's just say spell check doesn't correct case sensitive things. – DataMinion Jul 04 '21 at 21:37
  • @BrandonStivers ah I know the pain :) – mkaran Jul 05 '21 at 08:27
1

You can run it like this on terminal.

ENV_NAME=test
cat << EOF > new.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${ENV_NAME}
EOF

Then do a cat new.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
0

Using yamls add_implicit_resolver and add_constructor works for me but like this with the above example:

import yaml
import re
import os
os.environ['VAR']="you better work"
path_matcher = re.compile(r'\$\{([^}^{]+)\}')
def path_constructor(loader, node):

  ''' Extract the matched value, expand env variable, and replace the match '''
  print("i'm here")
  value = node.value
  match = path_matcher.match(value)
  env_var = match.group()[2:-1]
  return os.environ.get(env_var) + value[match.end():]

yaml.add_implicit_resolver('!path', path_matcher, None, yaml.SafeLoader)
yaml.add_constructor('!path', path_constructor, yaml.SafeLoader)

data = """
env: ${VAR}/file.txt
other: file.txt
"""

if __name__ == '__main__':
  p = yaml.safe_load(data)
  print(os.environ.get('VAR')) ## you better work
  print(p['env']) ## you better work/file.txt
meshaun9
  • 11
  • 1