I am having trouble executing an SQL statement where I have a huge list of values for my IN statement. I'm trying to come up with a way to iterate through the list and get a combined result set.
The error I am currently getting is: [SQL Server]Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them. (8632) (SQLExecDirectW)')
This is because my IN list is 75k items long.
I'm assuming this needs to chunked up in some way, and looped through?
PseudoCode
List = 75k items
Chunk_Size = 10000
For i in first Chunk_Size of List
df.append = SQL results WHERE IN List of 10k items
This would then loop 8 times appending the results to a dataframe. - I want to be able to define the "chunk size" for testing purposes.
Pieces of real code:
import pyodbc, pandas as pd
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=server;'
'DATABASE=db;'
'UID=Me;'
'PWD=Secret;'
'Trusted_Connection=yes;')
list_of_ids = ['1', 'Hello', '3.4', 'xyz5000'] #etc to 75k
params= "', '".join(list_of_ids)
sql = ("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params))
sql_query = pd.read_sql_query(sql,conn)
I know that using format is bad, but I couldn't get the execute statement to work right. When I do this:
sql_query2 = cursor.execute("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ?;",list_of_ids)
I get errors like this: pyodbc.ProgrammingError: ('The SQL contains 1 parameter markers, but 4 parameters were supplied', 'HY000')
Thoughts? Thanks in advance!
SOLUTION EDIT, using Tim's solution below, I was able to accomplish this by doing:
Chunk_Size = 10000
list2 = []
cursor = conn.cursor()
for i in range(0, len(list_of_ids), Chunk_Size):
params= "', '".join(list_of_ids[i:i+Chunk_Size])
sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params)
cursor.execute(sql)
list1 = cursor.fetchall()
list2.append(list1)
from pandas.core.common import flatten
list3 = list(flatten(list2))
match_df = pd.DataFrame.from_records(list3, columns=['Item Number','Name','Part'])