I'm trying to upsert a pandas dataframe to a MS SQL Server using pyodbc. I've used a similar approach before to do straight inserts, but the solution I've tried this time is incredibly slow. Is there a more streamlined way to accomplish an upsert than what I have?
sql_connect = pyodbc.connect('Driver={SQL Server Native Client 11.0}; Server=blank1; Database=blank2; UID=blank3; PWD=blank4')
cursor = sql_connect.cursor()
for index, row in bdf.iterrows():
res = cursor.execute("UPDATE dbo.MPA_BOOK_RAW SET [SITE]=?, [SHIP_TO]=?, [PROD_LINE]=?, [GROUP_NUMBER]=?, [DESCRIPTION]=?, [ORDER_QTY]=?, [BPS_INCLUDE]=? WHERE [CUST]=? AND [ORDER_NUMBER]=? AND [ORDER_DATE]=? AND [PURCHASE_ORDER]=? AND [CHANNEL]=? AND [ITEM]=? AND [END_DT]=?",
row['SITE'],
row['SHIP_TO'],
row['PROD_LINE'],
row['GROUP_NUMBER'],
row['DESCRIPTION'],
row['ORDER_QTY'],
row['BPS_INCLUDE'],
row['CUST'],
row['ORDER_NUMBER'],
row['ORDER_DATE'],
row['PURCHASE_ORDER'],
row['CHANNEL'],
row['ITEM'],
row['END_DT'])
if res.rowcount == 0:
cursor.execute("INSERT INTO dbo.MPA_BOOK_RAW ([SITE], [CUST], [ORDER_NUMBER], [ORDER_DATE], [PURCHASE_ORDER], [CHANNEL], [SHIP_TO], [PROD_LINE], [GROUP_NUMBER], [DESCRIPTION], [ITEM], [ORDER_QTY], [END_DT], [BPS_INCLUDE]) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
row['SITE'],
row['CUST'],
row['ORDER_NUMBER'],
row['ORDER_DATE'],
row['PURCHASE_ORDER'],
row['CHANNEL'],
row['SHIP_TO'],
row['PROD_LINE'],
row['GROUP_NUMBER'],
row['DESCRIPTION'],
row['ITEM'],
row['ORDER_QTY'],
row['END_DT'],
row['BPS_INCLUDE'])
sql_connect.commit()
cursor.close()
sql_connect.close()
I tried the above with a five row sample of my original ~50k row dataframe and it worked fine. So the logic seems okay. It's just the speed that is an issue.