1

I want to get table name from 'FROM' and 'JOIN'. This is the example of the query:

SELECT * FROM film JOIN language ON film.language_id = language.language_id WHERE language_name = "English"

I use sqlparse library in python. This is I've tried:

line="FROM film INNER JOIN language ON film.language_id = 
language.language_id WHERE language.name = 'English' GROUP BY language_id"
a = sqlparse.parse(line)[0].tokens
print (a)


Out[91]: 
[<Keyword 'FROM' at 0x1EEA18CDAC8>,
 <Whitespace ' ' at 0x1EEA18CDCA8>,
 <Identifier 'film' at 0x1EEA18CE6D8>,
 <Whitespace ' ' at 0x1EEA18CDB28>,
 <Keyword 'INNER ...' at 0x1EEA18CDBE8>,
 <Whitespace ' ' at 0x1EEA18CDC48>,
 <Keyword 'langua...' at 0x1EEA18CDA68>,
 <Whitespace ' ' at 0x1EEA18CDD08>,
 <Keyword 'ON' at 0x1EEA18CDD68>,
 <Whitespace ' ' at 0x1EEA18CDDC8>,
 <Comparison 'film.l...' at 0x1EEA18CE7C8>,
 <Whitespace ' ' at 0x1EEA18B61C8>,
 <Where 'WHERE ...' at 0x1EEA18CE4F8>,
 <Keyword 'GROUP ...' at 0x1EEA18B65E8>,
 <Whitespace ' ' at 0x1EEA18B6648>,
 <Identifier 'langua...' at 0x1EEA18CE750>]

The output I expect:

table = [film,language]

does someone can help me? Thanks

wdwilhelmina
  • 39
  • 1
  • 7

1 Answers1

0

I adapted a solution suggested here by michaelshobbs. It seems to work, but I didn't test it thoroughly.
The code i use is this:

import itertools
import sqlparse

from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML

def is_subselect(parsed):
    if not parsed.is_group:
        return False
    for item in parsed.tokens:
        if item.ttype is DML and item.value.upper() == 'SELECT':
                return True
    return False    

def extract_from_part(parsed):
    from_seen = False
    for item in parsed.tokens:
        if item.is_group:
            for x in extract_from_part(item):
                yield x
        if from_seen:
            if is_subselect(item):
                for x in extract_from_part(item):
                    yield x
            elif item.ttype is Keyword and item.value.upper() in ['ORDER', 'GROUP',    'BY', 'HAVING','GROUP BY']:
                from_seen = False
                StopIteration
            else:
                yield item
        if item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True

def extract_table_identifiers(token_stream):
    for item in token_stream:
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                value = identifier.value.replace('"', '').lower()
                yield value
        elif isinstance(item, Identifier):
            value = item.value.replace('"', '').lower()
            yield value

def extract_tables(sql):
    # let's handle multiple statements in one sql string
    extracted_tables = []
    statements = list(sqlparse.parse(sql))
    for statement in statements:
        if statement.get_type() != 'UNKNOWN':
            stream = extract_from_part(statement)
            extracted_tables.append(set(list(extract_table_identifiers(stream))))
    return list(itertools.chain(*extracted_tables))
Duccio Fabbri
  • 998
  • 1
  • 10
  • 21