I have a dask dataframe which has 220 partitions and 7 columns. I have imported this file from a bcp file as and completed some wrangling in dask. I then want to write this whole file to mssql using turboodbc. I connect to the DB as follows:
mydb = 'TEST'
from turbodbc import connect, make_options
connection = connect(driver="ODBC Driver 17 for SQL Server",
server="TEST SERVER",
port="1433",
database=mydb,
uid="sa",
pwd="5pITfir3")
I then use the function i found from a medium article to write to a test table in the DB:
def turbo_write(mydb, df, table): """Use turbodbc to insert data into sql.""" start = time.time() # preparing columns columns = '(' columns += ', '.join(df.columns) columns += ')'
# preparing value place holders
val_place_holder = ['?' for col in df.columns]
sql_val = '('
sql_val += ', '.join(val_place_holder)
sql_val += ')'
# writing sql query for turbodbc
sql = f"""
INSERT INTO {mydb}.dbo.{table} {columns}
VALUES {sql_val}
"""
print(sql)
print(sql_val)
# writing array of values for turbodbc
values_df = [df[col].values for col in df.columns]
print(values_df)
# cleans the previous head insert
with connection.cursor() as cursor:
cursor.execute(f"delete from {mydb}.dbo.{table}")
connection.commit()
# inserts data, for real
with connection.cursor() as cursor:
#try:
cursor.executemanycolumns(sql, values_df)
connection.commit()
# except Exception:
# connection.rollback()
# print('something went wrong')
stop = time.time() - start
return print(f'finished in {stop} seconds')
This works when I upload a small amount of rows as follows:
turbo_write(mydb, df_train.head(1000), table)
When i try to do a larger number of rows it fails:
turbo_write(mydb, df_train.head(10000), table)
I get the error:
RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
How do i write the whole dask dataframe to mssql without any errors?