0

How to extract column names along with their conditions applied from a SQL query string using Python, so that I can map the result to other table and get the required data?

query = "SELECT COUNT(*) as cnt FROM dbo.main_table WHERE (VKRKA10001920015 = 1 AND age IN (15, 16, 17, 18, 19) AND income IN (101, 102, 301, 302, 305))"

Required_Output = {
  'VKRKA10001920015': 1,
  'age' : (15, 16, 17, 18, 19),
  'income' : (101, 102, 301, 302, 305),
}

As of now I am able to extract only column names using sql-metadata python package. But I need the conditions as well to perform other operations

from sql_metadata.compat import get_query_columns
print(get_query_columns(query))

['VKRKA10001920015', 'age', 'householdincome']
AKX
  • 152,115
  • 15
  • 115
  • 172
vikas madoori
  • 147
  • 1
  • 11
  • 1
    You could use the [`sqlparse` module](https://sqlparse.readthedocs.io/en/latest/) to build a (likely limited) parser for SELECT statements. – AKX May 30 '22 at 09:38
  • I have tried sqlparse but no luck it is not yielding proper results. Could you please help me @AKX – vikas madoori May 30 '22 at 11:55
  • What did you try in particular? Can you show the code? Writing a proper parser even with sqlparse having done some of the work for you isn't perfectly trivial. – AKX May 30 '22 at 12:18
  • Thanks @AKX I am able to get the required results using sqlparse module. Below I am attaching the code for reference to others – vikas madoori May 31 '22 at 13:33

1 Answers1

0

To identify the tokens ("IN", "=", "LIKE") I have used the get_tokens() which recursively get the tokens added to the extracted_key() dictionary. Here in the extracted_keys dict the key address stored instead of the actual key Since the functions is going recursively. So I have used create_dict() another function to create a proper dictionary out of extracted_keys dictionary. For now we are able to extract columns along with their conditions from the SQL WHERE conditon for following operators i.e., "IN" operator, "=" operator, "Like" operator.

To install sqlparse module

pip install sqlparse

import sqlparse

def get_tokens(where):
  identifier = None
  extracted_keys = {}
  
  for i in where.tokens:
    try:
      name = i.get_real_name()
      
      if name and isinstance(i, sqlparse.sql.Identifier):
        name = i.get_real_name()
        identifier = i
        extracted_keys[identifier] = i.value
      
      elif identifier and isinstance(i, sqlparse.sql.Parenthesis):
        extracted_keys[identifier] = i.value
        
      elif i and "in" not in i.value.lower() and "or" not in i.value.lower() and isinstance(i, sqlparse.sql.Comparison):
        if "=" in i.value:
            equal_index = i.value.index("=")
            if i.value[equal_index -1] not in ["!", ">", "<"]:
              key,value = i.value.split("=")[0], i.value.split("=")[1]
              extracted_keys[key] = value
            
        if "!=" in i.value:
            key,value = i.value.split("!=")[0], i.value.split("!=")[1]
            extracted_keys[key] = value
            
        elif "like" in i.value.lower():
            key,value = i.value.lower().split("like")[0].upper(), i.value.lower().split("like")[1]
            extracted_keys[key] = value
        
      else:
        extracted_keys.update(get_tokens(i))
    
    except Exception as error:
      pass
    
  return extracted_keys

def create_dict(extracted_keys):
  cols = {}
  for key, value in extracted_keys.items():
    try:
      if key.get_real_name():
        cols[key.get_real_name()] = value
    except Exception as e:
      cols[key] = value
  return cols

For example:

query = "SELECT COUNT(*) as cnt FROM dbo.main_table WHERE (VKRKA10001920015 = 1 AND age IN (15, 16, 17, 18, 19) AND income IN (101, 102, 301, 302, 305))"

parsed = sqlparse.parse(query)
where = parsed[0][-1]
extracted_keys = get_tokens(where)

extracted_keys = create_dict(extracted_keys)

Output

{
  'VKRKA10001920015': 1,
  'age' : (15, 16, 17, 18, 19),
  'income' : (101, 102, 301, 302, 305),
}
vikas madoori
  • 147
  • 1
  • 11
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 31 '22 at 17:22