I am currently writing code to insert a bunch of object data into a mysql database through a plain python script. The amount of rows I need to insert is on the order of a few thousand. I want to be able to do this as fast as possible, and wanted to know if there is a performance difference between calling executeMany() on a bunch fo rows and then calling commit(), vs calling execute() many times and then calling commit()
3 Answers
It is always more efficient to perform all operations at once, and commit at the end of the process. commit
incurs additional processing that you don't want to repeat for each and every row, if performance matters.
The more operations you perform, the greater the performance benefit. On the other hand, you need to consider the side effect of a long lasting operation. As an example, if you have several processes inserting concurrently, the risk of deadlock increases - especially if duplicate key errors arise. An intermediate approach is to insert in batches. You may want to have a look at the MYSQL documentation on locking mechanisms.
The MySQL documentation has an interesting section about how to optimize insert
statements - here are a few picks:
the
load data
syntax is the fastest available optionusing multiple
values()
lists is also quite faster than running multipleinserts

- 216,147
- 25
- 84
- 135
Here some tips: Tune the mysql settings in /etc/mysql/my.cnf (for Ubuntu) can increase the performance of Mysql a lot. More memory+cache is usually better for queries. Create a very long text with many insert queries and semicolons will improve your speed a lot. Keeping the entire database in memory gives maximum speed, but not suitable for most projects. Tips for mysql tuning are at: https://duckduckgo.com/?q=mysql+tune+for+speed&t=newext&atb=v275-1&ia=web.

- 995
- 11
- 21
In python should it be indifferent, because the data must be comited before they are inserted.
so there should be little difference between execute and executemanybut as stated here
the mysql homepage also states
With the executemany() method, it is not possible to specify multiple statements to execute in the operation argument. Doing so raises an InternalError exception. Consider using execute() with multi=True instead.
So if you have doubts about performance you can have a look at sqlalchemy is seems to be a bit faster, but takes time to get it to work

- 45,398
- 8
- 30
- 47