Not a duplicate- How to break apart/iterate Python SQL statement WHERE IN, with HUGE list

Question

I am having trouble executing an SQL statement where I have a huge list of values for my IN statement. I'm trying to come up with a way to iterate through the list and get a combined result set.

The error I am currently getting is: [SQL Server]Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them. (8632) (SQLExecDirectW)')

This is because my IN list is 75k items long.

I'm assuming this needs to chunked up in some way, and looped through?

PseudoCode

List = 75k items
Chunk_Size = 10000
For i in first Chunk_Size of List
    df.append = SQL results WHERE IN List of 10k items

This would then loop 8 times appending the results to a dataframe. - I want to be able to define the "chunk size" for testing purposes.

Pieces of real code:

import pyodbc, pandas as pd
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
                      'SERVER=server;'
                      'DATABASE=db;'
                      'UID=Me;'
                      'PWD=Secret;'
                      'Trusted_Connection=yes;')

list_of_ids = ['1', 'Hello', '3.4', 'xyz5000'] #etc to 75k
params= "', '".join(list_of_ids)
sql = ("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params))
sql_query = pd.read_sql_query(sql,conn)

I know that using format is bad, but I couldn't get the execute statement to work right. When I do this:

sql_query2 = cursor.execute("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ?;",list_of_ids)

I get errors like this: pyodbc.ProgrammingError: ('The SQL contains 1 parameter markers, but 4 parameters were supplied', 'HY000')

Thoughts? Thanks in advance!

SOLUTION EDIT, using Tim's solution below, I was able to accomplish this by doing:

Chunk_Size = 10000
list2 = []
cursor = conn.cursor()

for i in range(0, len(list_of_ids), Chunk_Size):
    params= "', '".join(list_of_ids[i:i+Chunk_Size])
    sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params)
    cursor.execute(sql)
    list1 = cursor.fetchall()
    list2.append(list1)

from pandas.core.common import flatten
list3 = list(flatten(list2))
match_df = pd.DataFrame.from_records(list3, columns=['Item Number','Name','Part'])

Does this answer your question? [Python call sql-server stored procedure with table valued parameter](https://stackoverflow.com/questions/51930062/python-call-sql-server-stored-procedure-with-table-valued-parameter) You need a TVP (and therefore a table type also), this allows you to write `WHERE Part IN (SELECT Value FROM @MyTVP)` — Charlieface, Aug 05 '21 at 19:11
You could handle most of it on the SQL side, by writing the values for your IN statement (list_of_ids) to a temporary table and then doing a join. — Jason Cook, Aug 05 '21 at 19:56

Tim Roberts · Accepted Answer · 2021-08-05T22:04:56.460

1

Since pyodbc doesn't support array parameters, there are few alternatives to formatting the request yourself.

for i in range(0, len(list_of_ids), Chunk_Size):
    params= "', '".join(list_of_ids[i:i+Chunk_Size])
    sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params)
    sql_query = pd.read_sql_query(sql,conn)

You could allow the substitutions by building the set on the fly like this:

for i in range(0, len(list_of_ids), Chunk_Size):
    sublist = list_of_ids[i:i+Chunk_Size]
    sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ({});".format(','.join(['?']*len(sublist)))
    sql_query = pd.read_sql_query(sql,conn,sublist)

How many parts are there? Would it be quicker to fetch the entire table and filter it in Python?

edited Aug 05 '21 at 22:04

answered Aug 05 '21 at 19:16

Tim Roberts

48,973
4
21
30

I am doing some validation checking on the first solution. The second solution just results in an error where the '?' is listed 75k times as a column name. As for the table, it's 10's of millions of rows with ~100 columns. – Steve Aug 05 '21 at 20:41
My mistake, I need to `join` the `?` signs with commas. I patched it. – Tim Roberts Aug 05 '21 at 22:05
Still chipping away. I get a different error for COUNT field incorrect or syntax error. Trying to figure out if it's something I did. But on top of that, I'm attempting to do the "correct" way of appending the data in each loop. So instead of creating a dataframe, I need to append to a list, and then after the loop is done turn that into a df. – Steve Aug 05 '21 at 23:37
That being said, I did get the first solution to return a result. It was just less rows than expected (because It's not appending.) And the rabbit hole on that led me to the fact that you shouldn't append to a dataframe. – Steve Aug 05 '21 at 23:39
Gotta love SO closing the question because someone thinks this has been answered elsewhere. Your answer however is what I was looking for. I was able to modify it with ```cursor.execute(sql)``` and ```cursor.fetchall()``` and append each loop to a list. I then flattened the list and translated it to a dataframe. I was able to quickly return 3mil results and do the work that I needed to. Thank you! – Steve Aug 06 '21 at 16:56

Not a duplicate- How to break apart/iterate Python SQL statement WHERE IN, with HUGE list

1 Answers1