2

My MySQL table schema is:

CREATE DATABASE test_db;
USE test_db;
CREATE TABLE test_table (
    id INT AUTO_INCREMENT,
    last_modified DATETIME NOT NULL,
    PRIMARY KEY (id)
) ENGINE=InnoDB;

When I run the following benchmark script, I get:

b1: 20.5559301376

b2: 0.504406929016

from timeit import timeit
import MySQLdb

ids = range(1000)

query_1 = "update test_table set last_modified=UTC_TIMESTAMP() where id=%(id)s"
query_2 = "update test_table set last_modified=UTC_TIMESTAMP() where id in (%s)" % ", ".join(('%s', ) * len(ids))

db = MySQLdb.connect(host="localhost", user="some_user", passwd="some_pwd", db="test_db")

def b1():
    curs = db.cursor()
    curs.executemany(query_1, ids)
    db.close()

def b2():
    curs = db.cursor()
    curs.execute(query_2, ids)
    db.close()

print "b1: %s" % str(timeit(lambda:b1(), number=30))
print "b2: %s" % str(timeit(lambda:b2(), number=30))

Why is there such a large difference between executemany and the IN clause?

I'm using Python 2.6.6 and MySQL-python 1.2.3.

The only relevant question I could find was - Why is executemany slow in Python MySQLdb?, but it isn't really what I'm after.

Community
  • 1
  • 1
Anish Ramaswamy
  • 2,326
  • 3
  • 32
  • 63
  • The key difference is that `executemany` doesn't guarantee a single database round-trip. It's 'best-effort' where the effort is often not that great vs 'actual single-statement round-trip'. See also https://stackoverflow.com/questions/4101076/executemany-confusion – pvg May 26 '17 at 05:29

1 Answers1

3

executemany repeatedly goes back and forth to the MySQL server, which then needs to parse the query, perform it, and return results. This is perhaps 10 times as slow as doing everything in a single SQL statement, even if it is more complex.

However, for INSERT, this says that it will do the smart thing and construct a multi-row INSERT for you, thereby being efficient.

Hence, IN(1,2,3,...) is much more efficient than UPDATE;UPDATE;UPDATE...

If you have a sequence of ids, then even better would be to say WHERE id BETWEEN 1 and 1000. This is because it can simply scan the rows rather than looking up each one from scratch. (I am assuming id is indexed, probably as the PRIMARY KEY.)

Also, you are probably running with the settings that make each insert/update/delete into its own "transaction". This adds a lot of overhead to each UPDATE. And it is probably not desirable in this case. I suspect you want the entire 1000-row update to be atomic.

Bottom line: Use executemany only for (a) INSERTs or (b) statements that must be run individually.

Rick James
  • 135,179
  • 13
  • 127
  • 222