0

I'm writing a utility which I intend to be useable with at least three different database server backends (SQLite3, PostgreSQL, and MySQL). I'm not using an ORM (though I try a fork using SQLAlchemy after I've finished my prototype and become sufficiently comfortable with how to maintain the schema for possible access by other tools, potentially from other languages in the future).

I'm posting this in the spirit of StackOverflow's policy regarding answering one's own questions and opening the question and answer up for discussion.

The code posted in the answer below has been tested against all three supported targets pre-configured with a very simple subset of my intended schema (hard-code near the top of the script). It's currently only checking that each table has all of the required columns and only warning if extra columns exist. I'm NOT checking for types nor other constraints on the columns contents.

(My utility should allow the end-users to add additional columns, so long as they allow NULL values or code to supply suitable default values --- as my code will only be inserting or updating the subsets of the columns that I'm specifying).

Please let me know if there are better ways to accomplish this or if I've made some egregious error that isn't revealed in my tests.

(I'd like to know how to properly create SQLFiddle entries/links that could be shared to make this question easier to work with. Any pointers on that would be welcome, I've tried creating this one here for my MySQL schema ... let's see if that works).

Jim Dennis
  • 17,054
  • 13
  • 68
  • 116

1 Answers1

1

As specified above here's my approach:

#!/usr/bin/env python
import importlib
import sys

schema = (('tags',     set(('id','tag','gen','creation'))),
          ('nodes',    set(('name', 'id', 'gen', 'creation'))),
          ('node_tag', set(('node_id', 'tag_id', 'creation'))),
          ('tgen',     set(('tid', 'comment', 'creation')))
         )

drivers = {
           'mysql':  'MySQLdb',
           'pg':     'psycopg2',
           'sqlite': 'sqlite3',
          }

if __name__ == '__main__':
    args = sys.argv[1:]
    if args[0] in drivers.keys():
        dbtype = args[0]
        db = importlib.import_module(drivers[dbtype])
    else:
        print >> sys.stderr, 'Unrecognized dbtype %s, should be one of these:' % args[0], ' '.join(drivers.keys())
        sys.exit(127)

    if dbtype == 'sqlite':
        required_args = 2
        dbopts = { 'database': args[1] }
    else:
        required_args = 6
        dbopts = { 'database' : args[1],
                   'user'     : args[2],
                   'passwd'   : args[3],
                   'host'     : args[4],
                   'port'     : int(args[5]),
                }

if len(args) < required_args:
    print >> sys.stderr, 'Must supply all arguments:',
    print >> sys.stderr, '[mysql|pg|sqlite] database user passwd host port'
    sys.exit(126)

if dbtype == 'mysql':
    dbopts['db'] = dbopts['database']
    del dbopts['database']

if dbtype == 'pg':
    dbopts['password'] = dbopts['passwd']
    del dbopts['passwd']

try:
    conn = db.connect(**dbopts)
except db.Error, e:
    print 'Database connection failed: %s' % e
    sys.exit(125)

cur  = conn.cursor()

exit_code = 0
for each_table in schema:
    table, columns = each_table
    introspected_columns = None
    try:
        cur.execute("SELECT * FROM %s WHERE 0=1" % table)
        introspected_columns = set((x[0] for x in cur.description))
    except db.Error, e:
        print >> sys.stderr, 'Encountered %s Error on table %s' % (e, table)
    if introspected_columns:
        missing = columns - introspected_columns
        extra   = introspected_columns - columns
    if missing:
        print 'Error: Missing columns in table %s: %s' % (table,' '.join(missing))
        exit_code += 1
    else:
        print 'All columns present for table %s' % table
    if extra:
        print 'Warning: Extra columns in table %s: %s' % (table, ' '.join(extra))

sys.exit(exit_code)

... in actual practice this will be refactored into a class and called during service start-up of the daemon that I'm writing.

Note the actual technique here is to make a query for all columns SELECT * FROM some_table ... but guaranteed to return no rows WHERE 0=1 ... meaning I don't need to know any of the table names in advance. This, seems to consistently set the DBAPI cur.description field in the DBAPI (client-side) cursor.

(Incidentally this example also highlights some annoying differences in the connection keyword arguments among popular DBAPI database drivers. None of them seem to ignore extra keys, so we need significantly different argument set for each of the three; just filename for SQLite3, and changing the 'database' key's name to 'db' for MySQLdb and 'password' to 'passwd' for PostgreSQL --- for for the psycopg2 driver, at least).

Jim Dennis
  • 17,054
  • 13
  • 68
  • 116
  • Leaving aside the functionality, I would not recommend passing login/password pairs as explicit params to any scripts. They become visible in your history and in process monitoring tools. – newtover Jul 02 '14 at 09:49
  • Regarding passwords on the command line: as mentioned this code is only for testing the concept. The actual passwords are stored in ".ini" files read by ConfigParser and I have code that validates the permissions of the file and it's parent directory. – Jim Dennis Jul 02 '14 at 09:52