16

I am writing a small DB test suite, which reads configuration files with queries and expected results, e.g.:

query         = "SELECT * from cities WHERE name='Unknown';"
count         = 0
level         = 1
name          = "Check for cities whose name should be null"
suggested_fix = "UPDATE cities SET name=NULL WHERE name='Unknown';"

This works well; I divide each line using Python's string.partition('=').

My problem is very long SQL queries. Currently, I just paste these queries as a one-liner, which is ugly and unmaintainable.

I want to find an elegant, Pythonic way to read the right of an expression, even if spans over many lines.

Notes:

  • my SQL queries might contain the =
  • I don't fancy the idea of forcing "s around the right hand side, because there are many existing files without it.

EDIT:

ConfigParser is great, but it forces me to add a space or tab at the beginning of every line in a multiline entry. This might be a great pain.

Thanks in advance,

Adam

Adam Matan
  • 128,757
  • 147
  • 397
  • 562

4 Answers4

25

The Python standard library module ConfigParser supports this by default. The configuration file has to be in a standard format:

[Long Section]
short: this is a normal line
long: this value continues
    in the next line

The configuration file above could be read with the following code:

import ConfigParser
config = ConfigParser.ConfigParser()
config.read('longsections.cfg')
long = config.get('Long Section', 'long')
Anentropic
  • 32,188
  • 12
  • 99
  • 147
Tendayi Mawushe
  • 25,562
  • 6
  • 51
  • 57
  • 3
    I have a problem with this solution as errors are raised on parsing the file. According the documentation, for multiline values indentation is needed. I added 4 spaces at the start of "in the next line" and it works. – Michael Currin Sep 02 '17 at 08:22
11

This is almost exactly the use-case that made us switch to YAML (Wikipedia, python implementation, documentation; you might want to look at JSON as an alternative). YAML has some advantages over configparser or json:

  • human readability (better than JSON for larger files);
  • can serialize arbitrary python objects (which makes it as un-safe as pickle, but there is a safe_load function in the python implementation to alleviate this issue). This is already useful for something as simple as a datetime object.

For completeness sake, the main disadvantages (IMO):

  • Python implementation by an order of magnitude slower than JSON implementation;
  • less portable across platforms than JSON.

For example

import yaml

sql = """
query         : "SELECT * from cities
WHERE name='Unknown';"
count         : 0
level         : 1
name          : "Check for cities whose name should be null"
suggested_fix : "UPDATE cities SET name=NULL WHERE name='Unknown';"
"""

sql_dict = yaml.safe_load(sql)

print(sql_dict['query'])

prints

SELECT * from cities WHERE name='Unknown';
stephan
  • 10,104
  • 1
  • 51
  • 64
1

I would you suggest to use a regular expression... The code might look like this to give you are start:

import re

test="""query = "select * from cities;"
count = 0
multine_query = "select *
from cities
     where name='unknown';"
"""

re_config = re.compile(r'^(\w+)\s*=\s*((?:".[^"]*")|(?:\d+))$', re.M)
for key, value in re_config.findall(test):
    if value.startswith('"'):
        value = value[1:-1]
    else:
        value = int(value)
    print key, '=', repr(value)

The output of this example is:

~> python test.py 
query = 'select * from cities;'
count = 0
multine_query = "select *\nfrom cities\n     where name='unknown';"

Hope that helps!

Regards, Christoph

tux21b
  • 90,183
  • 16
  • 117
  • 101
  • 2
    +1 That should work, but I really prefer a ready-made package that supports all kinds of edge conditions. – Adam Matan Feb 12 '10 at 09:44
1

If you can use Python 3.11+, TOML is the new hotness which is included in the standard library. It's like a simpler YAML which is much more focused on being an editable config format (many TOML libraries usually don't provide a way to write TOML).

# yourconfig.toml  (the filename assumed below, tho this is also a comment.)
query         = "SELECT * from cities WHERE name='Unknown';"
count         = 0
level         = 1   # 1 = foo, 2 = bar, 3 = baz  (other example comment)
name          = "Check for cities whose name should be null"
suggested_fix = """
UPDATE cities 
SET name=NULL 
WHERE name='Unknown';"""

The provided library is tomllib, and you can use it like so:

import tomllib

with open("yourconfig.toml") as f:
    conf = tomllib.load(f)

import pprint
pprint.pprint(conf)
# {'count': 0,
#  'level': 1,
#  'name': 'Check for cities whose name should be null',
#  'query': "SELECT * from cities WHERE name='Unknown';",
#  'suggested_fix': "UPDATE cities \nSET name=NULL \nWHERE name='Unknown';"}

Note that the suggested_fix key includes the newlines (\n) implied by the multiline string. You can use a trailing backslash to remove them, though if the value is SQL, they don't matter, so I wouldn't clutter it up.

Other benefits are that the count and level values are integers. TOML's built-in types include strings, arrays/lists, maps/dicts, ints, floats, datetimes, and booleans.

TOML has rigid types that aren't "cleverly" inferred.

  • In YAML, on might be a boolean or a string, depending on if you're on version 1.1 or 1.2. In TOML it's a syntax error unless you quote it. If you wanted a boolean, use exactly one of true or false.

  • In YAML, O.1 is a string (it starts with an uppercase O), in TOML it's a syntax error.

  • In YAML, you basically need https://yaml-multiline.info/ to figure out how to get your multiline string to be exactly what you expect, and even then it still takes me a while. In TOML it can kinda be the same as if you wrote it in Python (escaping rules are a bit different, using single/double quotes impacts it, and trailing backslashes trims whitespace)

Nick T
  • 25,754
  • 12
  • 83
  • 121