I am investigating whether Turbodbc can solve a problem for us when it come to massive amounts of DB inserts. We are talking millions of batch inserts. At the moment, it is a lot slower than any other solutions I have tried, for instance mongodb inserts, and pandas's to_sql function. I am not sure if I am missing a setting or something specific. Does anyone have experience with Turbodbc and maybe experienced the same issue?
- I have tried on a mac, as well as dockerised debian(buster) environment.
- I have tried executemanycolumns, as well as executemanywith a sql query, both slow.
- I have installed unixodbc
For debian I am using these MySQL ODBC drivers
The gist of the code is as follows
import turbodbc
connection = turbodbc.connect('astcdr', turbodb_options=options)
cursor = connection.cursor()
cursor.fast_executemany = True
// *** Measuring duration of operation
// Loading a big CSV
// *** Measuring duration of operation
// Mutate some data
// *** Measuring duration of operation
// Build the sql
// *** Measuring duration of operation
....
// Execute
cursor.executemany(sql, dataArray) <- this operation takes long
// *** Measuring duration of operation
My odbc.ini file looks as follow
[astcdr]
Description=MySQL connection to database
Driver=MySQL
Database=test
Server=mysql_container
User=test
Password=test
Port=3306
My odbcinst.ini looks as follow
[ODBC Drivers]
MySQL ODBC 8.0=Installed
[MySQL]
Driver=/var/www/mysql-connector-odbc-8.0.19-linux-debian10-x86-64bit/lib/libmyodbc8w.so
UsageCount=2
I pip install the following, of which pybind11 fails during docker build, but it is installed when I mount the container.
pybind11
pandas
numpy
pyarrow
pytest
pytest-cov
mock
six
turbodbc
Since the result is the same on my mac, and in a dockerised env, I either made the same mistake twice in setting it up, or Turbodbc just doesn't work the way I think it does. Deos anyone have any suggestions?
Thank you