The problem is trying to upload data to a SQL Server and getting speeds of 122 rows per second (17 columns). I decided to post the problem here along with the workaround in the hopes someone knows the definitive answer.
The most relevant thread I found was but the problem differs significantly and still with no answer: pyodbc - very slow bulk insert speed
It's a simple scenario in which I try to upload a CSV of 350K rows into a blank SQL Server table using Python. After trying one of the most popular ways, that is, read it as a pandas DataFrame, create a sql_alchemy engine with fast_executemany=True and use the to_sql() method to store into the database. I got 122 rows / second, which is unacceptable.
As mentioned in other threads, this doesn't happen in PostgreSQL or Oracle and I can add that neither does it happen in MariaDB. So I've tried a different approach, using the pyodbc cursor.executemany() to see if there was a bug in pandas or sql_alchemy. Same speed.
The next step was to generate synthetic data to replicate the problem to submit a bug... and to my surprise the generated data was around 8000 records / second. WTF? The data used the same data type (obviously) as the one in the CSV.
After weeks of trying different things, I decided to look into pydobc itself. In pyodbc github dev site, I found an interesting piece of information at https://github.com/mkleehammer/pyodbc/wiki/Binding-Parameters, particularly in the Writing NULL and in the Solutions and Workarounds sections.
Indeed, 3 of the 17 fields on the first line of the CSV were converted to 'Nan' in Pandas or in None manually by me. To my surprise after, replacing these None/Nan/NULL for valid values on the FIRST LINE ONLY, boosted the speed to 7-8000 records/s. Note that I didn't change any of the None/Nan in the subsequent lines, only on the first one.
Does anyone understand why do this happen? Is there a more elegant fix than to switch to replace None/Nan to a valid value?
UPDATE: It seems there a couple of related issues on Github page and all pointing to this same issue. For reference: https://github.com/mkleehammer/pyodbc/issues/213. The thread is relatively old, from 2017 but it seems the problem in how to deal with None/Nan still persist.