1

I'm sending various data frames to Microsoft SQL Server using the Pandas function to_sql() and a mssql+pyodbc:// connection made with sqlalchemy.create_engine. Sometimes to_sql() returns the number of rows written, which is what I expect from the documentation on Returns:

Number of rows affected by to_sql. None is returned if the callable passed into method does not return an integer number of rows.

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

But in some cases I'm seeing it return negative values like -1, 2, -11, -56. If I use method="multi" this behavior goes away. Here I'm writing a table with 325 records:

>>> PLSUBMITTALTYPE.to_sql("PLSubmittalType", con=data_lake, if_exists="replace")
-1

>>> PLSUBMITTALTYPE.to_sql("PLSubmittalType", con=data_lake, if_exists="replace", method="multi", chunksize = 50)
325

What do those negative values mean? It appears to be succeeding in writing to the database in those cases.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • Show us the call, please, so we can see the args. What rows are in there already? Are you upserting? Are there integrity constraints? Does the symptom persist if you’re writing to an empty temp table? – J_H Mar 16 '23 at 20:55
  • @J_H I've added a call and output. These are writing new tables and I've found this behavior vanishes when I specify `method="multi"`. – Sam Firke Mar 17 '23 at 14:12

1 Answers1

0

You have a software stack that is not behaving in a completely reliable, predicted, documented way. I recommend changing your approach.

Use some combination of CREATE TABLE / DELETE FROM / TRUNCATE to obtain a temp table that has zero rows in it. Often CREATE TABLE LIKE is a convenient approach for this.

Invoke with two args: .to_sql(temp_table, con=con)

This successfully INSERTed and reported number of rows without incident.

Now submit a transaction to transfer those rows to the table of interest. You have several options:

  • INSERT all rows (perhaps with errors ignored)
  • UPDATE all rows
  • use JOIN to segregate new / existing values, and do separate INSERT / UPDATE
  • some other vendor-specific UPSERT technique

When choosing an option you now have full flexibility to use any technique suggested by your DB vendor or the community, rather than having to funnel things through the narrow cross-vendor API offered by to_sql. This puts you back in the driver seat, so you can implement the more reliable solution you desire.

J_H
  • 17,926
  • 4
  • 24
  • 44