Given an SQL query, how can I programmatically extract schema information about the database it is querying? Ideally I would like to use Python to parse and extract info.
For example, the following SQL:
SELECT
rc.dateCooked,
r.name,
i.ingredient
FROM recipeCooked rc
INNER JOIN recipe r ON r.recipeID = rc.recipeID
LEFT OUTER JOIN recipeIngredient ri ON ri.recipeID = r.recipeID
LEFT OUTER JOIN ingredient i ON i.ingredientID = ri.ingredientID;
Would lead to the following set of relationships (show as csv):
table1, key1, table2, key2, join_type
recipeCooked, recipeID, recipe, recipeID, INNER
recipe, recipeID, recipeIngredient, recipeID, LEFT OUTER
recipeIngredient, ingredientID, ingredient, ingredientID, LEFT OUTER
I'm looking for anything that might help me with this challenge including code snippets, recommendations of tools that might be useful, concepts worth googling, recommended approaches, or simply reports of others facing the same challenge.
The Python sqlparse library seems to be a very useful tool for this task but it only gets me part of the way there.
For context: I regularly query a healthcare relational database with >20K tables. It is not well documented and the main way I get information about it is by looking at SQL code from other data analysts. Right now I do a lot of manually going through SQL code and drawing out database schema diagrams, but this can be slow and tedious. I'd like to automate this!