2

Are there any best-practices for config-file documentation, especially for python?


Particularly in scientific computing, it is common to use a config file as the input to control a batch processing job (such as a simulation), and expect the user to customise a substantial portion of the config for their scenario. (The config also likely selects among different processing modules, each possessing different suites of config fields.) Thus, the user ought to know: what each setting means or effects; which settings are unused (in which circumstances); what are the default values (and the permissible values or ranges); etc.

I've found incomplete config file docs to be common. The fundamental problem seems to be that if the docs are maintained separately from the code, they grow out of sync. (This seems less of a problem with API docs due to standard practices involving colocated docstrings and autogeneration from function signatures/argspec.) For example if the standard python configparser is used once to parse the config file, then the code for accessing individual attributes (and implicitly determining the config schema) may still be spread out across the entire code base (and perhaps only available at runtime rather than when building docs).


Further thoughts:
  • Is it bad practice to replace a config file (yaml or similar) with a user-customised python script (so as to only need API docs)?
  • Distribution of a well commented example config file (that is also used in automatic tests): how to maintain if different scenarios duplicate large sections but need some completely different fields?
  • Can a single schema be maintained, both for use in code (to help parse, validate, and set defaults) and to generate docs somehow?
  • Is there a human readable/writeable way of (des)serialising the state of some (sub)class instance that represents a new batch process (so that config is covered by existing docs)?
benjimin
  • 4,043
  • 29
  • 48

1 Answers1

1

Personally, I like to use the argparse module for configuration, and read the default value for each setting from an environment variable. That centralizes the settings and documentation in one place, and allows the user to either tweak settings on the command line or set and forget them in environment variables. Be careful about putting passwords on the command line, though, because other users can probably see your command line arguments in the process list.

Here's an example that uses argparse and environment variables:

def parse_args(argv=None):
    parser = ArgumentParser(description='Watch the raw data folder for new runs.',
                            formatter_class=ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        '--kive_server',
        default=os.environ.get('MICALL_KIVE_SERVER', 'http://localhost:8000'),
        help='server to send runs to')
    parser.add_argument(
        '--kive_user',
        default=os.environ.get('MICALL_KIVE_USER', 'kive'),
        help='user name for Kive server')
    parser.add_argument(
        '--kive_password',
        default=SUPPRESS,
        help='password for Kive server (default not shown)')

    args = parser.parse_args(argv)
    if not hasattr(args, 'kive_password'):
        args.kive_password = os.environ.get('MICALL_KIVE_PASSWORD', 'kive')
    return args

Setting those environment variables can be a bit confusing, particularly for system services. If you're using systemd, look at the service unit, and be careful to use EnvironmentFile instead of Environment for any secrets. Environment values can be viewed by any user with systemctl show.

I usually make the default values useful for a developer running on their workstation, so they can start development without changing any configuration.

Another option is to put the configuration settings in a settings.py file, and just be careful not to commit that file to source control. I have often committed a settings_template.py file that users can copy.

If your settings are so complicated/flexible that environment variables or a settings file get messy, then I would convert the project to a library with an API. Instead of settings, users then write a script that calls your API. You don't have to go through the effort of hosting your library on PyPI, either. pip can install from a GitHub repository, for example.

Don Kirkby
  • 53,582
  • 27
  • 205
  • 286