parsing SQL with python to find specific statements

Question

I am using sqlparse to try and find specific statements.

code:

import sqlparse

text = 'SELECT * FROM dbo.table'
parse = sqlparse.parse(text)
pizza = [t for t in parse[0].tokens if t.ttype in (sqlparse.tokens.Keyword)]

I get

[<DML 'SELECT' at 0x9f12318>, <Keyword 'FROM' at 0x9f12548>]

but what I really want is to return the table name:

dbo.table

how can I do that? My end goal is to parse through folders of sql scripts and find references and other details.

Well, `dbt.table` is not a Keyword, so you shouldn't filter on keywords — Tim, Apr 15 '15 at 09:49
I thought Keyboard would return the word and the text after the keyword till the next token item. How do I return text after a keyword? — KillerSnail, Apr 15 '15 at 16:22

score 2 · Answer 1 · answered Apr 15 '15 at 19:16

If you print each of the tokens

>>> pprint(parse[0].tokens)

[<DML 'SELECT' at 0x7fc278c9eb50>,
 <Whitespace ' ' at 0x7fc278c9eba8>,
 <Wildcard '*' at 0x7fc278c9ec00>,
 <Whitespace ' ' at 0x7fc278c9ec58>,
 <Keyword 'FROM' at 0x7fc278c9ecb0>,
 <Whitespace ' ' at 0x7fc278c9ed08>,
 <Identifier 'dbo.ta...' at 0x7fc278cb3628>]

you will see that dbo.table is in fact not a Keyword, but an Identifier

Logical next step: filter on Identifier

>>> pprint([t for t in parse[0].tokens if t.ttype in (sqlparse.tokens.Identifier)])

but that gives AttributeError: 'module' object has no attribute 'Identifier'. If you have a look at the source code for sqlparse, you will see that there is no Identifier in sql.tokens, you you cannot filter on that. (mistake in the code? perhaps forgot to implement that?)

However, if you print the ttype of each token

>>> pprint([t.ttype for t in parse[0].tokens])

[Token.Keyword.DML,
 Token.Text.Whitespace,
 Token.Wildcard,
 Token.Text.Whitespace,
 Token.Keyword,
 Token.Text.Whitespace,
 None]

You will see that they all have a valid Token as ttype, except for dbo.table, which is None.

Knowing that, we can filter the tokens on that and get the result we want:

>>> print([t for t in parse[0].tokens if t.ttype is None])

[<Identifier 'dbo.ta...' at 0x7f6ae0318628>]

And to get the name:

>>> print([t.to_unicode() for t in parse[0].tokens if t.ttype is None][0])
u'dbo.table'

or

>>> print([str(t) for t in parse[0].tokens if t.ttype is None][0])
'dbo.table'

note the [0] at the end, to get the first and only item in the array that the list comprehension produces

parsing SQL with python to find specific statements

1 Answers1