-1

How to extract all the columns mentioned in a sql query along with the values mentioned in all the conditions in Python. Lets say we have the following query.

SELECT CASE
    WHEN grade=90 THEN "A1"
    WHEN grade=80 THEN "B1"
    WHEN grade=70 THEN "C1"
    ELSE "D1"
END as GradeRank, *

 
FROM employees 
WHERE age > 30 
AND department = 'sales' 
AND salary IN (SELECT salary FROM employees WHERE name like "Adam%")

I want something like (Order of list elements doesnt matter):

output: [("age",30), ("department":"sales"), ("name","Adam%"),("grade",90),("grade",80),("grade",70)]

Note: This is just a sample query, may not be very efficient, just a random query to test python code to see if I am able to get output as mentioned above.

Please help.

Here's what I have already tried.

import re

def extract_conditions(sql_query):
    # Regular expression to match SQL conditions
    condition_regex = r"\bWHERE\b\s+(.+)\s*(\bGROUP BY\b|\bHAVING\b|\bORDER BY\b|\bSELECT\b|\bLIMIT\b|\bOFFSET\b|\bFETCH\b|\bFOR\b|$)"
    
    # Extract conditions from the SQL query
    match = re.search(condition_regex, sql_query, re.IGNORECASE)
    if match:
        conditions = match.group(1)
        # Split conditions by AND or OR operators
        conditions = re.split(r"\s+(AND|OR)\s+", conditions, flags=re.IGNORECASE)
        # Remove any leading or trailing white space from the conditions
        conditions = [c.strip() for c in conditions]
        return conditions
    else:
        return None
    
# Example SQL query string
sql_query = "SELECT * FROM employees WHERE age > 30 AND department = 'sales' AND salary in (SELECT salary from employees WHERE name like 'Adam%')"

# Extract conditions from the SQL query
conditions = extract_conditions(sql_query)
print(conditions)

Expected Output:

output: [("age",30), ("department":"sales"), ("name","Adam%"),("grade",90),("grade",80),("grade",70)]

Actual Output:

['age > 30', 'AND', "department = 'sales'", 'AND', "salary in (SELECT salary from employees WHERE name like 'Adam%')"]
InSync
  • 4,851
  • 4
  • 8
  • 30

1 Answers1

0

Here is something that can be worked upon. I might have gone a bit of a long way here but you can fine tune it.
It works on single as well as multi-line queries.

Code:

import re

# Example SQL query string
sql_string = '''SELECT CASE
    WHEN grade=90 THEN "A1"
    WHEN grade=80 THEN "B1"
    WHEN grade=70 THEN "C1"
    ELSE "D1"
END as GradeRank, *
FROM employees 
WHERE age > 30 
AND department = 'sales' 
AND salary IN (SELECT salary FROM employees WHERE name like "Adam%")'''


# Function to extract column and their values.


def extract_conditions(sql_string):
    # Pre-processing to remove the SQL clauses not wanted in the output. - You can add here as per the requirement.
    preprocessed_sql = re.split('SELECT|WHEN|THEN|AND|WHERE|IN|\n', sql_string, flags=re.IGNORECASE)
    _ = []
    # Extracting required columns and their values using condtional operators. - You can add more here if I missed.
    for i in preprocessed_sql:
        if re.search('=|like|<|>', i):
            parsed_value = re.split('=|like|<|>', i)
            parsed_value = (parsed_value[0].strip(), parsed_value[1].strip().replace(')', '').replace('"', '').replace("'", ''))
            _.append(parsed_value)

    return _


print(extract_conditions(sql_string))

Output:

[('grade', '90'), ('grade', '80'), ('grade', '70'), ('age', '30'), ('department', 'sales'), ('name', 'Adam%')]