I am trying to bulk insert a .CSV
file into SQL Server without much success.
A bit of background:
1. I needed to insert 16 million records into a SQL Server (2017) DB. Each record has 130 columns. I have a field in the .CSV
resulting from an API call from one of our vendors which I am not allowed to mention. I had integer, floats and strings data types.
2. I tried the usual: BULK INSERT
but I could not get passed the data type errors. I posted a question here but could not make it work.
3. I tried experimenting with python and tried every method I could find but pandas.to_sql
for everybody warned it was very slow. I got stuck with data type and string truncate errors. Different to the ones from BULK INSERT
.
4. Without much options I tried pd.to_sql
and while it did not raise any data type or truncation errors it was failing due to running out of space in my tmp SQL database. I could not pass this error either although I had plenty of space and all my data files (and log files) were set to autogrowth without limit.
I got stuck at that point. My code (for the pd.to_sql
piece) was simple:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://@myDSN")
df.to_sql('myTable', engine, schema='dbo', if_exists='append',index=False,chunksize=100)
I am not really sure what else to try, any word of advice is welcome. All codes and examples I have seen deal with small datasets (not many columns). I am willing to try any other method. I would appreciate any pointers.
Thanks!