0

I am investigating whether Turbodbc can solve a problem for us when it come to massive amounts of DB inserts. We are talking millions of batch inserts. At the moment, it is a lot slower than any other solutions I have tried, for instance mongodb inserts, and pandas's to_sql function. I am not sure if I am missing a setting or something specific. Does anyone have experience with Turbodbc and maybe experienced the same issue?

The gist of the code is as follows

import turbodbc

connection = turbodbc.connect('astcdr', turbodb_options=options)
cursor = connection.cursor()
cursor.fast_executemany = True

// *** Measuring duration of operation
// Loading a big CSV
// *** Measuring duration of operation
// Mutate some data
// *** Measuring duration of operation
// Build the sql
// *** Measuring duration of operation
....

// Execute
cursor.executemany(sql, dataArray) <- this operation takes long
// *** Measuring duration of operation 

My odbc.ini file looks as follow

[astcdr]
Description=MySQL connection to database
Driver=MySQL
Database=test
Server=mysql_container
User=test
Password=test
Port=3306

My odbcinst.ini looks as follow

[ODBC Drivers]
MySQL ODBC 8.0=Installed

[MySQL]
Driver=/var/www/mysql-connector-odbc-8.0.19-linux-debian10-x86-64bit/lib/libmyodbc8w.so
UsageCount=2

I pip install the following, of which pybind11 fails during docker build, but it is installed when I mount the container.

pybind11
pandas
numpy
pyarrow
pytest
pytest-cov
mock
six
turbodbc

Since the result is the same on my mac, and in a dockerised env, I either made the same mistake twice in setting it up, or Turbodbc just doesn't work the way I think it does. Deos anyone have any suggestions?

Thank you

user3454396
  • 403
  • 4
  • 11

1 Answers1

1

There are two things that you should be aware of:

  1. You should try another setting of parameter_sets_to_buffer as described in https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html. Increasing this may increase the throughput.
  2. Don't compare the performance between different database. MongoDB and MySQL have different features/performance characteristics and thus INSERT performance also varies greatly. Thus it is bad to compare turbodbc+MySQL to it. I'm not aware of the MySQL OBDC driver performance but as there is a native mysql connector, this might be better than the ODBC driver which can be really slow sometimes and thus turbodbc won't help you. turbodbc will bring you speedups in comparison to pyodbc but requires that the ODBC driver itself is already efficient.
Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42