26

I've been trying to test various methods for making my code to run. To begin with, I have this list:

member_list = [111,222,333,444,555,...]

I tried to pass it into this query:

query = pd.read_sql_query(
"""
select member id
    ,yearmonth
from queried_table
where yearmonth between ? and ?
    and member_id in ?
""", db2conn, params = [201601, 201603, member_list])

However, I get an error that says:

'Invalid parameter type. param-index=2 param-type=list', 'HY105'

So I looked around and tried using formatted strings:

query = pd.read_sql_query(
"""
select member id
    ,yearmonth
from queried_table
where yearmonth between ? and ?
    and member_id in (%s)
""" % ','.join(['?']*len(member_list), db2conn, params = [201601, 201603, tuple(member_list)])

Now, I get the error:

'The SQL contains 18622 parameter markers, but 3 parameters were supplied', 'HY000'

because it's looking to fill in all the ? placeholders in the formatted string.

So, ultimately, is there a way to somehow evaluate the list and pass each individual element to bind to the ? or is there another method I could use to get this to work?

Btw, I'm using pyodbc as my connector.

Thanks in advance!

Scratch'N'Purr
  • 9,959
  • 2
  • 35
  • 51

3 Answers3

25

Break this up into three parts to help isolate the problem and improve readability:

  1. Build the SQL string
  2. Set parameter values
  3. Execute pandas.read_sql_query

Build SQL

First ensure ? placeholders are being set correctly. Use str.format with str.join and len to dynamically fill in ?s based on member_list length. Below examples assume 3 member_list elements.

Example

member_list = (1,2,3)
sql = """select member_id, yearmonth
         from queried_table
         where yearmonth between {0} and {0}
         and member_id in ({1})"""
sql = sql.format('?', ','.join('?' * len(member_list)))
print(sql)

Returns

select member_id, yearmonth
from queried_table
where yearmonth between ? and ?
and member_id in (?,?,?)

Set Parameter Values

Now ensure parameter values are organized into a flat tuple

Example

# generator to flatten values of irregular nested sequences,
# modified from answers http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
def flatten(l):
    for el in l:
        try:
            yield from flatten(el)
        except TypeError:
            yield el

params = tuple(flatten((201601, 201603, member_list)))
print(params)

Returns

(201601, 201603, 1, 2, 3)

Execute

Finally bring the sql and params values together in the read_sql_query call

query = pd.read_sql_query(sql, db2conn, params)
Community
  • 1
  • 1
Bryan
  • 17,112
  • 7
  • 57
  • 80
  • The flatten function did it!!! However, I think I just ran into a problem with a parameter cap. Apparently, my member list has 84000+ members, but when I try to dynamically fill in the `?` placeholders for all those members, Python limited to a max of 18,622. Looks like I'm going to have to split my member list to do this. Thanks for you help! – Scratch'N'Purr Apr 25 '16 at 16:35
  • Happy to help. Since you're working with that many `member_list` values, will likely get better performance (and fix the parameter limitation) by populating another table then inner join to filter results. [pandas.DataFrame.to_sql](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html#pandas-dataframe-to-sql) might help with table creation. – Bryan Apr 25 '16 at 16:43
  • 2
    For `pymysql` you need a `%s` placeholder, so it should be `','.join(['%s'] * len(member_list))` – slhck Apr 10 '18 at 11:35
  • 1
    Bryan's answer is awesome! One note though, params is not by default the 3rd argument in pd.read_sql_query, so it only worked when I specified that params=params as such: pd.read_sql_query(sql, db2conn, params=params) more: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html – Dylan Pulver Mar 17 '21 at 21:39
14

WARNING! Although my proposed solution here works, it is prone to SQL injection attacks. Therefor, it should never be used directly in backend code! It is only safe for offline analysis.

If you're using python 3.6+ you could also use a formatted string litteral for your query (cf https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)

start, end = 201601, 201603
selected_members = (111, 222, 333, 444, 555)  # requires to be a tuple

query = f"""
    SELECT member_id, yearmonth FROM queried_table
    WHERE yearmonth BETWEEN {start} AND {end}
      AND member_id IN {selected_members}
"""

df = pd.read_sql_query(query, db2conn)
bluu
  • 542
  • 3
  • 13
  • note: 'start' and 'end' might need to be cast as strings depending on the type of the 'yearmonth' column in your DB table... – bluu Jun 08 '18 at 12:26
  • 15
    While your proposed method will work, it wouldn't be recommended because it would be prone to SQL injection attacks. – Scratch'N'Purr Jun 08 '18 at 12:31
  • There is indeed a slight risk (but a lot of things would make the query fail anyway). That being said, the proposed solution doesn't provide a whole lot more security (just checking that the past list is enumerable via the usage of join). I think if you take it far enough that might also be prone to injection attacks... Anyway, first it wasn't not clear to me that the query served a front-end directy; second, the responsibility of validating the input should be in a function wrapping that query, not the query string itself... – bluu Jun 12 '18 at 12:13
  • Last thing, depending on the DB engine used, you might need to validate the `selected_members` tuple when it's composed of a single element. That's because in python it would print as `(111,)` whereas (at least via psycopg2) the query would only successfully be parsed if it was `(111)`. – bluu Jul 31 '18 at 16:42
  • 2
    This is a very bad idea and a **major** risk. What if `selected_members` contained `1); drop table Students'; --` ? Parameters eliminate the risk of SQL injection because the parameter values never become part of the query. `doesn't provide a whole lot more security` actually, it does – Panagiotis Kanavos Feb 10 '22 at 11:30
  • 1
    Good point Panagiotis! I'll update my answer with a warning message. And sorry for my past comments, I wasn't conscious on those types of threats back then... – bluu Feb 22 '22 at 17:12
-5
query = 'Select count(*) cnt from TBL_DESK_AUDIT  where trunc(DATETIMECREATED) = trunc(sysdate) and DESK_NAME =' + "'"+dataframe_list1[0][0] + "'"
print(query)
df_TBL_DESK_AUDIT = pd.read_sql_query(query, connect);
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459